Smart home speakers, assistant platforms and cross-device solutions, so you can talk to your smartwatch and see the result on your TV or car’s dashboard. Personal assistants and VUIs are slowly appearing around us and it’s pretty likely that they will make our lives much easier.
Because of my great faith that natural language will be the next human-machine interface, I decided to start writing new blog posts series and building an open source code where I would like to show how to create new kind of apps: conversational oriented, device-independent assistant skills which will give us freedom in platform or hardware we use.
And will bring the most natural interface for humans - voice.
WaterLog assistant skill
In this post, we’ll start with the simples implementation of assistant skill. WaterLog is an app which lets us track daily water intake by talking or writing in natural language directly to Google Assistant. The first version of the app will have the ability to log how much liters or milliliters of water we have drunk during the day.
For the sake of simplicity, we’ll skip theory behind VUI design focus only on technical aspects of how to build fully working implementation.
Here are scenarios of possible conversations (happy paths):
User: Ok Google, Talk to WaterLog
WaterLog: Hey! Welcome to Water Log. Do you know that you should drink about 3 liters of water each day to stay healthy? How much did you drink so far?
User: I drunk 500ml of water
WaterLog: Ok, I’ve added 500ml of water to your daily log. In sum you have drunk 500ml today. Let me know when you drink more! See you later.
User: Ok Google, Talk to WaterLog
WaterLog: Hey! You have drunk 500ml today. How much water should I add now?
WaterLog: Ok, I’ve added 100ml of water to your daily log. In sum you have drunk 600ml today. Let me know when you drink more! See you later.
Returning user asking for logged water
User: Ok Google, Ask WaterLog how much water have I drunk today?
WaterLog: In sum you have drunk 600ml today. Let me know when you drink more! See you later.
In case you would like to test this skill on your device, it’s available live in Google Assistant directory, or on website:
The app is extremely simple but even this kind of project still requires to tie some pieces together to make it working. While we have a lot of freedom when it comes to platform selection (we could build our app in many different languages and host it on any cloud solutions like Google Cloud Platform or Amazon Web Services), at the beginning we choose the most recommended tech stack:
Firebase Cloud Functions and Realtime Database for app backend,
Dialogflow for conversation definitions and natural language understanding,
Google Actions SDK for Google Assistant integration (in the future we would give a try to other platforms like Amazon Alexa or Facebook Messenger Platform).
Google Actions website has really good step-by-step guide how to do this
Start with new project on Actions on Google console.
When it’s done, you will be asked to pick a tool or platform to build assistant skill. Like I said, it’ll be Dialogflow. If you do it right, your apps (Actions and Dialogflow) should be connected. You can check this in Dialogflow agent setting (see Google Project property):
Google Project ID should match project at Actions on Google console
First big piece of our assistant app is conversational agent, which is built on Dialogflow platform in our case. The most important role of it is to understand what user says to our app and convert natural language sentence into actions and properties which can be handled by our code. And this is exactly what Dialogflow Intents do.
According to the documentation:
An intent represents a mapping between what a user says and what action should be taken by your software.
Let’s start defining our intents. Here are the list of sentences which we would like to handle:
Default Fallback Intent
The only one which we leave untouched for now. Like the name says, this intent is triggered if a user’s input is not matched by any of the regular intents or enabled domains. Documentation. It’s worth mentioning that this intent isn’t even passed into our application code. It’s entirely handled by Dialogflow platform.
Event used to greet our user. It’s used always when user ask for our app (e.g. Ok Google, talk to WaterLog) without any additional intention.
— Config —
WELCOME, GOOGLE_ASSISTANT_WELCOME — events are additional mappings which allow to invoke intents by an event name instead of a user query.
✅ Use webhook — Intent welcome_user will be passed to our backend.
Event is used to save how much water user would like to log during the conversation. There will be a couple cases which we would like to handle in the same way. Let’s list some of them:
Ok Google, Talk to WaterLog to log 1 liter of water — intent is triggered immediately when user invoke our action. In this case welcome intent is skipped. More about assistant invocation can be found in Google Actions documentation.
Log 500ml of water — told in the middle of conversation, when app is waiting for user’s input.
500ml — usually as an answer for assistant question: WaterLog: …how much water did you drink today? User: 500ml
To handle similar cases we need to provide example utterances which could be told by users. Examples then are used by Dialogflow Machine Learning to teach our agent to understand user input. The more examples we use, the smarter our agent becomes.
Additionally we need to annotate fragments of our examples which needs to be handled in special way, so e.g. our app knows that utterance:
I have drunk 500ml of water
contains number and units of volume of water that has been drunk. All we have to do is to select fragment and pick correct entity (there are plenty of built-in entities, see the documentation).
200 mililiters marked as system entity: @sys.unit-volume
— Config —
User says (should be much more examples, esp. in more complex apps):
Example utterances used to teach our Dialogflow agent
✅ Use webhook
✅ End conversation — pick this to let Google Assistant know that conversation should be finished here.
Event used to user how much water he or she has drunk in current day. Similarly to log_water, there different ways to invoke this intent:
Ok Google, ask WaterLog how much water did I drink today? — called instead of welcome intent when the action is known,
How much did I drink? — asked in the middle of conversation with our app.
— Config —
Fulfillment: ✅ Use webhook
Google assistant: ✅ End conversation
And that’s it for Dialogflow configuration for now. If you would like to see full config, you can download it and import into you agent from the repository (WaterLog.zip file).