On one of our recent Distillery projects, we faced an interesting dilemma: how to implement voice control for the system we were building. After searching for an appropriate service, we ultimately settled on Alexa, Amazon’s voice service.
Overview of Amazon Alexa
Unlike other representative voice technologies such as Cortana, Siri, and OK Google, Alexa provides a wide-ranging API which can be used not only on mobile devices, but also in web applications. Moreover, there are SDKs and other development solutions available for platforms and programming languages such as Java, Python, and Node.js, so it wouldn’t be difficult to implement Alexa for your basic app. Besides, Alexa is an Amazon product and can be used cooperatively with other Amazon services (e.g., AWS Lambda).
Building Skills and Handling Responses via Flask-Ask
The first step is to create an Alexa skill. You can find the full walkthrough of this process here, so we won’t focus on that in this blog. Instead, let’s examine how an Alexa skill actually works. There are two main components of any skill: intents and utterances. Intents are the callbacks that associate human speech with an event in our system, and utterances link the intents in our system to certain boilerplate phrases and their variations.
There are two possible approaches for handling Alexa intents. First, as mentioned before, one option is to use AWS Lambda. Another option is to use a custom HTTPS server. For educational purposes, let’s select the second option and use Flask-Ask. Flask-Ask is a web server based on Flask (a Python microframework) which can be used as an endpoint for Alexa’s intents. It uses the same approach as that used in working with Flask routes: we put the name of the intent in the Alexa decorator and then create a method that handles the response.
@ask.intent("CMAppointmentIntent") def create_appointment(): """ Initialize appointment's creation workflow; Pass to date definition """ msg = render_template('date') return question(msg)
As you can see, the method returns a rendered template wrapped in a question() function. This happens because every intent handler should return one of two interaction states:
- Question: Supposed to be the phrase that will grab the next intent
- Statement: Supposed to be the completed phrase without any further communication
Beyond the name of the intent, a Flask-Ask decorator accepts parameters mapping. You can map it to a different variable name or to a Python data type:
# Mapping to a different variable name @ask.intent("AMAZON.AddAction
Ultimately we were successful in building voice control into our app using Flask-Ask, solving the dilemma in an efficient, productive way.
The possible issues we’ve uncovered thus far relate to the potential inability either to completely override some of the default intents, or to avoid calling up some of Alexa’s built-in services.
During the development of this small application, it was noted that the more the utterances include words that are connected to default Alexa skills, the more frequently it calls the default intent instead of your custom intent. To avoid this issue, you can add unique words for every utterance in your list, thereby decreasing the chances of unintentionally calling the preinstalled intents.
A related issue involves unintentional linkage to a different Amazon service. If the utterance contains words that are included in default skill utterances or somehow connected with one of the built-in services (e.g., calendar or to-do), there’s a high probability that it will call the predefined intent instead of your skill.
The last issue relates to the handling of regular human speech. One solution would be to use the AMAZON.LITERAL slot type, but this approach is pretty labor-intensive. According to the documentation, you need to specify as many “anchor” words as possible that allow this intent to be called. However, even if we set up all possible utterances, we’ll face an issue with Flask-Ask. During testing, it was revealed that these phrases (which should be examined with Alexa) are not received by the Flask-Ask intent handler. Besides, it’s possible to create your own slot type using the samples of available values, and this may solve the issue.
Amazon Alexa is an intriguing technology which has the potential to transform our basic approach to UX and user interfaces. Voice control is currently on the cutting edge of technology, so it’s crucial for businesses to heed this trend and (where appropriate) use voice control to enhance the ways they interact with their customers. However, voice control isn’t flawless, so — as you make your strategic choices — it’s crucial to pay close attention to the intended use of your Alexa application.
Want to explore how Distillery can help you integrate voice control capabilities into your app idea? Let us know!
About the Author
Vadim Sokoltsov joined the Distillery team in 2017. Being a Ruby on Rails professional, he also loves learning Python and experimenting with new technologies. His true passion, however, is for big data and machine learning – and he’s always prepared to use his mixed martial arts training to defend his point of view on these topics.