Configuring Your first Google Assistant skill

Smart home speakers, assistant platforms and cross-device solutions, so you can talk to your smartwatch and see the result on your TV or car’s dashboard. Personal assistants and VUIs are slowly appearing around us and it’s pretty likely that they will make our lives much easier.

Because of my great faith that natural language will be the next human-machine interface, I decided to start writing new blog posts series and building an open source code where I would like to show how to create new kind of apps: conversational oriented, device-independent assistant skills which will give us freedom in platform or hardware we use.
And will bring the most natural interface for humans - voice.

WaterLog assistant skill

In this post, we’ll start with the simples implementation of assistant skill. WaterLog is an app which lets us track daily water intake by talking or writing in natural language directly to Google Assistant. The first version of the app will have the ability to log how much liters or milliliters of water we have drunk during the day.

For the sake of simplicity, we’ll skip theory behind VUI design focus only on technical aspects of how to build fully working implementation.

Here are scenarios of possible conversations (happy paths):

New user

User: Ok Google, Talk to WaterLog

WaterLog: Hey! Welcome to Water Log. Do you know that you should drink about 3 liters of water each day to stay healthy? How much did you drink so far?

User: I drunk 500ml of water

WaterLog: Ok, I’ve added 500ml of water to your daily log. In sum you have drunk 500ml today. Let me know when you drink more! See you later.

Returning user

User: Ok Google, Talk to WaterLog

WaterLog: Hey! You have drunk 500ml today. How much water should I add now?

User: 100ml

WaterLog: Ok, I’ve added 100ml of water to your daily log. In sum you have drunk 600ml today. Let me know when you drink more! See you later.

Returning user asking for logged water

User: Ok Google, Ask WaterLog how much water have I drunk today?

WaterLog: In sum you have drunk 600ml today. Let me know when you drink more! See you later.

In case you would like to test this skill on your device, it’s available live in Google Assistant directory, or on website:

Getting started

The app is extremely simple but even this kind of project still requires to tie some pieces together to make it working. While we have a lot of freedom when it comes to platform selection (we could build our app in many different languages and host it on any cloud solutions like Google Cloud Platform or Amazon Web Services), at the beginning we choose the most recommended tech stack:

  • Firebase Cloud Functions and Realtime Database for app backend,

  • Dialogflow for conversation definitions and natural language understanding,

  • JavaScript/Node.js for app code (at this moment this is the only supported language by Firebase Cloud Functions),

  • Google Actions SDK for Google Assistant integration (in the future we would give a try to other platforms like Amazon Alexa or Facebook Messenger Platform).

Google Actions website has really good step-by-step guide how to do this

In short:

  1. Start with new project on Actions on Google console.

  2. When it’s done, you will be asked to pick a tool or platform to build assistant skill. Like I said, it’ll be Dialogflow. If you do it right, your apps (Actions and Dialogflow) should be connected. You can check this in Dialogflow agent setting (see Google Project property):

Google Project ID should match project at Actions on Google console

Dialogflow agent

First big piece of our assistant app is conversational agent, which is built on Dialogflow platform in our case. The most important role of it is to understand what user says to our app and convert natural language sentence into actions and properties which can be handled by our code. And this is exactly what Dialogflow Intents do.
According to the documentation:

An intent represents a mapping between what a user says and what action should be taken by your software.

Let’s start defining our intents. Here are the list of sentences which we would like to handle:

Default Fallback Intent

The only one which we leave untouched for now. Like the name says, this intent is triggered if a user’s input is not matched by any of the regular intents or enabled domains. Documentation. It’s worth mentioning that this intent isn’t even passed into our application code. It’s entirely handled by Dialogflow platform.

welcome_user

Event used to greet our user. It’s used always when user ask for our app (e.g. Ok Google, talk to WaterLog) without any additional intention.

— Config —

Action name: input.welcome

Events: WELCOME, GOOGLE_ASSISTANT_WELCOME — events are additional mappings which allow to invoke intents by an event name instead of a user query.

Fulfillment: ✅ Use webhook — Intent welcome_user will be passed to our backend.

log_water

Event is used to save how much water user would like to log during the conversation. There will be a couple cases which we would like to handle in the same way. Let’s list some of them:

  • Ok Google, Talk to WaterLog to log 1 liter of water — intent is triggered immediately when user invoke our action. In this case welcome intent is skipped. More about assistant invocation can be found in Google Actions documentation.

  • Log 500ml of water — told in the middle of conversation, when app is waiting for user’s input.

  • 500ml — usually as an answer for assistant question: WaterLog: …how much water did you drink today? User: 500ml

To handle similar cases we need to provide example utterances which could be told by users. Examples then are used by Dialogflow Machine Learning to teach our agent to understand user input. The more examples we use, the smarter our agent becomes.

Additionally we need to annotate fragments of our examples which needs to be handled in special way, so e.g. our app knows that utterance:

I have drunk 500ml of water

contains number and units of volume of water that has been drunk. All we have to do is to select fragment and pick correct entity (there are plenty of built-in entities, see the documentation).

200 mililiters marked as system entity: @sys.unit-volume

— Config —

Action name: log_water
User says (should be much more examples, esp. in more complex apps):

Example utterances used to teach our Dialogflow agent

Fulfillment: ✅ Use webhook

Google assistant: ✅ End conversation — pick this to let Google Assistant know that conversation should be finished here.

getloggedwater

Event used to user how much water he or she has drunk in current day. Similarly to log_water, there different ways to invoke this intent:

  • Ok Google, ask WaterLog how much water did I drink today? — called instead of welcome intent when the action is known,

  • How much did I drink? — asked in the middle of conversation with our app.

— Config —

Action name: get_logged_water

User says:

Fulfillment: ✅ Use webhook
Google assistant: ✅ End conversation

And that’s it for Dialogflow configuration for now. If you would like to see full config, you can download it and import into you agent from the repository (WaterLog.zip file).

Need help with product design or development?

Our product development experts are eager to learn more about your project and deliver an experience your customers and stakeholders love.

Read more