Building a Skill for Amazon Alexa
As the Internet of Things (IoT) continues to transform the way people connect to information and the world around us, researchers predict that as many as 50 billion devices could be connected to the Internet by 2020.Amazon’s Alexa, has increasingly brought the IoT to mainstream consumers over the last year through an increasingly wide variety of ‘skills’ that allow you to do everything from order a pizza to an Uber ride – all with a voice command. Just last month, Capital One became the first company to give its customers the ability to interact with their financial information in real time through Alexa-enabled devices, such as the Amazon Echo, Amazon Tap, Echo Dot, and Fire TV. Now, customers can stay on top of their credit card account by checking their balance, reviewing recent transactions, or making payments, as well as get real-time access to checking and savings account information to understand their available funds – all hands free.Like Alexa’s other skills, the setup is easy: Simply search for and enable the Capital One skill in the Alexa app, enter your Capital One username and password, and begin asking Alexa questions like:· "Alexa, ask Capital One for my Quicksilver Card balance.”· “Alexa, ask Capital One for recent transactions on my checking account”· "Alexa, ask Capital One when is my credit card payment due?"· “Alexa, ask Capital One to pay my credit card bill.”To get a behind-the-scenes look at the making of the Capital One skill for Alexa, we sat down with Capital One Vice President of Digital Products Engineering Scott Totman, to learn more about how the Capital One skill came about and some of the design and technical challenges overcome along the way.Q: What made Capital One interested in leveraging Alexa and being the first financial services company to offer this type of application?At Capital One, we are continually evaluating emerging technologies and have a state-of-the-art technology platform that allows us to quickly leverage new technologies. We were excited about the possibility of leveraging voice-activated technology to create a new, convenient experience for our customers. It’s all about helping our customers to manage their money, wherever they are, and integrating banking into people’s every day lives.Personally, as I watched how quickly my eight-year old son was drawn to voice controlled technologies, I knew the Amazon Echo was going to be a disruptive technology and wanted to learn more.Q. How long did it take to get the Capital One skill up and running?This was really built in two phases. Initially, a few developers just started experimenting with Amazon Echo last summer. Then we combined efforts by putting everyone in a room and scoping a single feature: fetching a customer’s credit card balance. In doing so, we learned a lot about the platform and the level of effort required to produce a full public offering. Phase two kicked off in October, which entailed defining and building the initial set of skill capabilities, which were based on customer interviews and empathy based user research. Less than six months later, we formally launched the Capital One Skill at SXSW.Q. What was the main challenge you faced when developing the Capital One skill for Alexa?We spent a lot of time trying to get the conversation “right.” This is a new medium for customers and for us, so we had to learn not only what questions customers were going to ask, but how they were going to ask them. Additionally, we needed to balance convenience and security, allowing customers to interact with our skill easily while satisfying the extremely high security standards of both Capital One and Amazon. The Alexa Skills Kit itself was very straightforward to work with. It is evolving quickly, so developers need to dedicate a fair amount of time stay on top of the new capabilities as they are released.Q. What design challenges were posed by the voice interface?This was a really interesting challenge for our Design team. In order to make the skill feel like a personalized conversation, we have to know you, what matters to you, and the language you use to stay on top of your money. Do you want to feel a connection with Alexa, or do you just want her to state the facts? Do you prefer a sense of humor and personality in your interactions? Finances, after all, are not always a humorous topic for people, so we needed to have a deep understanding of our customers’ appetite for playfulness. For example, one of the most common customer questions is, “Can you make my payment go away?” That is a clear invitation to inject humor and provide a witty response.Another challenge was that without the use of spinners, progress bars or other visual cues commonly used in web and mobile experiences, latency becomes far more conspicuous. A few seconds of silence can seem like minutes. We addressed latency by keeping our APIs fast and constructing Alexa’s responses to be concise, without coming across as rude. Overly verbose responses may become as irritating as latency over time.Finally, we had to solve for an entirely new navigation model. In a web or app experience, the user is confined to a certain set of actions via a pathway of buttons, links, etc. In a conversation, however, there are no boundaries. A customer can jump from asking about a checking account balance to a credit card payment due date at any moment during the conversation. To solve for this, we mapped out some common workflows and managed for implicit context switching. Whenever we are unsure of the user’s request, we ask for clarification. For example, Alexa may ask, “You have multiple accounts: a checking account, and a credit card account. Which one are you asking about?”Q. How do you envision the future of voice-driven technologies?Alexa is the beginning of an overall trend toward voice-driven interactions with both the digital and physical world around us. We are still in the early adopter phase, but I believe that products like the Echo will quickly accelerate mainstream adoption. Customers will become increasingly comfortable interacting through various services using voice, and this interaction will become the norm, not the exception. As customers become more familiar using voice technologies, we anticipate growing demand in regards to feature capabilities, as well as the sophistication and elegance of the conversation. Moreover, technologies will begin to focus on interpreting a customer’s mood by the way they speak and change their interaction based on that mood. The team behind IBM’s Watson is already working on such capabilities. In the end, the overall services market will enable customers to teach us how to interact with them, instead of forcing them to learn how to interact with us.The Capital One skill is just the beginning for us as well. We came out with what we view as the “starting lineup” of features – allowing customers to check balances, pay credit card bills, and review their recent transactions. We will continue to test and learn, exploring new capabilities with Alexa by focusing on customer needs and refining the experience. In general, as celebratory and exciting as it is when you finally put an application in users’ hands, it is just the start of ongoing learnings and product improvement. In the case of voice driven services, it is the start of a conversation that will organically grow over time until customers are speaking as if they were talking to a human.More broadly, I think the sky is the limit for how Alexa can and will be used. An interesting article on PYMNTS.com looks at how Alexa can become “Champion of the Consumer” by leveraging company partnerships, location services, and other linkages to respond to basic requests such as “Alexa, I’m hungry,” “Alexa, I need gas,” or “Alexa, I’d like to go to the movies tonight.” The tech enthusiast, innovator and consumer in me is eager to see what’s next.Q. What advice would you give a developer who is trying to integrate the Alexa skill set into their product or service?While I think the possibilities are nearly endless, my biggest piece of advice to developers is to be very thoughtful about the “why” behind your application. Leveraging voice-activated technology is only worthwhile if you can connect the dots between why and how it will make someone’s life better.If you’re out at NY Tech Day this week, stop by the Capital One booth and learn more about how engineers, designers and product developments teams are working together to reimagine the future of banking.