In this Google Action tutorial for beginners, you will learn how to build an Action for Google Assistant (the voice assistant living inside Google Home) from scratch. We will cover the essentials of building an app for Google Assistant, how to set everything up on Dialogflow and the Actions on Google Console, and how to use Jovo to build your Action’s logic.
See also: Build an Alexa Skill in Node.js with Jovo
About Jovo: Jovo is an open source Node.js development framework for voice applications for both Amazon Alexa and Google Assistant. Check out the GitHub repository or the documentation, if you’re interested in learning more.
To get you started as quickly as possible, we’re going to create a simple Action that responds with “Hello World!”
Please note: This is a tutorial for beginners and explains the essential steps of Google Action development in detail. If you already have experience with Google Home or Google Assistant and just want to learn more about how to use Jovo, either skip the first few sections and go right to Code the Skill, or take a look at the Jovo Documentation.
In this section, you will learn more about the architecture of Google Assistant and how users interact with its Actions. First, let’s take a look at the wording of the different kinds of software and hardware that’s involved:
While it’s the hardware device that most users see, Google Home is not the name of the assistant you can develop actions for (wich sometimes causes confusion when people talk about “building an Action for Google Home“). The artificial intelligence you can hear speaking from inside the smart speaker is called Google Assistant (which is now also available on Android and iOS smartphones). Actions on Google are the applications that can be built on top of the Google Assistant platform.
The main difference between the architecture of building Google Actions and Alexa Skills is that for Google you need an additional layer to handle the natural language. Most Action developers use Dialogflow to configure their application’s language model:
We will take a deeper look into Dialogflow in section 2: Create an Agent on Dialogflow.
To understand how Google Actions work, let’s take a look at the two important elements:
There are a few steps that happen before a user’s speech input is reaching your Action. The voice input process (from left to right) consists of three stages that happen at three (four, if you count Google and Dialogflow as two) different places:
The voice output process (from right to left) goes back and passes the stages again:
In order to make the Action work, we need to configure it on both Dialogflow (for the natural language understanding) and the Actions on Google Console (for the Assistant integration). We’re going to create a new Dialogflow Agent in the next step.
We’re going to add our own agent now. Let’s get started:
Go to dialogflow.com and click “Go to console” on the upper right:
Now sign in with your Google account. To simplify things, make sure to use the same account that’s registered with your Actions on Google enabled device like Google Home (if possible) for more seamless testing.
Great! Once you’re in the console, click “create agent”:
We’re just going to name it “HelloWorldAgent” and leave the other information out for now:
After creating the agent, you can see the screen Intents:
These intents are part of the Agent’s language model. Let’s take a deeper look into how it works in the next section.
Dialogflow offers an easy (but also highly customizable) way to create a language model for your Google Action.
Let’s take a look:
Google Assistant and Dialogflow help you with several steps in processing input. First, they take a user’s speech and transform it into written text (speech to text). Afterward, they use a language model to make sense out of what the user means (natural language understanding).
A simple interaction model for Google Assistant (built with Dialogflow) consists of three elements: Intents, user expressions, and entities.
An intent is something a user wants to achieve while talking to your product. It is the basic meaning that can be stripped away from the sentence or phrase the user is telling you. And there can be several ways to end up at that specific intent.
For example, a FindRestaurantIntent from the image above could have different ways how users could express it. In the case of Dialogflow language models, these are called user expressions:
An user expression (sometimes called utterance) is the actual sentence a user is saying. There are often a large variety of expressions that fit into the same intent. And sometimes it can even be a little more variable. This is when entities come into play:
No matter if I’m looking for a super cheap place, a pizza spot that serves Pabst Blue Ribbon, or a dinner restaurant to bring a date, generally speaking it serves one purpose (user intent): to find a restaurant. However, the user is passing some more specific information that can be used for a better user experience. These are called entities:
These are just the very basic components for you to get introduced to some terms. We don’t need to know a lot about entities for this simple tutorial. However, it’s good to know for later steps.
Now that we know a little bit about how language models work, let’s create our first intent that’s being used to ask for our user’s name.
After creating the agent, you can see that there are two standard intents already in place. We’re going to keep them. The “Default Welcome Intent” will later be mapped to the Jovo “LAUNCH” intent.
Let’s create another intent and name it “HelloWorldIntent”:
Also add the following example phrases to the “Training Phrases” tab:
Save the intent and create another one named “MyNameIsIntent”. With this one we are also going to add example phrases of what the user could say to “Training Phrases” and also add an entity called “name” in the “Action and parameters“:
Now we have to map the entity we created to the “Training Phrases” section by selecting the word “name” and choosing “@sys.given-name:name“:
Now, let’s look at the code!
We’re going to use our Jovo Framework which works for both Alexa Skills and Actions on Google Home.
The Jovo Command Line Tools (see the GitHub repository) offer a great starting point for your voice application, as it makes it easy to create new projects from templates.
This should be downloaded and installed now (see our documentation for more information like technical requirements). After the installation, you can test if everything worked with the following command:
This should look like this:
Let’s create a new project with the
$ jovo new command (“helloworld” is the default template and will clone our Jovo Sample App into the specified directory):
For now, you only have to touch the
app.js file in the
/app folder. This is where all the configurations and app logic will happen. You can learn more about the Jovo Architecture here.
Let’s take a look at
The handlers variable is where you will spend most of your time when you’re building the logic behind your Google Action. The “helloworld” template already has three intents:
What’s happening here? When your skill is opened, it triggers the LAUNCH-intent, which contains a toIntent call to switch to the HelloWorldIntent. Here, the ask method is called to ask for your user’s name. After they answer, the MyNameIsIntent gets triggered, which greets your user with their name.
That’s it for now. Of course, feel free to modify this as you wish. To create more complex Google Actions, take a look at the framework’s capabilities here: Jovo Framework Docs: Building a Voice App.
So where do we send the response to? Let’s switch tabs once again and take a look at the Fulfillment section at Dialogflow:
To make a connection between the Dialogflow and your application, you need to an HTTPS endpoint (a webhook).
Jovo currently supports an Express server and AWS Lambda. We recommend the first one for local prototyping, but you can also jump to the Lambda section.
Jovo project come with off-the-shelf server support so that you can start developing locally as easy as possible.
You can find that part in the
Let’s try that out with the following command (make sure to go into the project directory first):
$ jovo run
This will start the express server and create a subdomain,which you can then submit to Dialogflow:
This should be enough for now to test and debug your Google Action locally. If you want to learn more about how to make it work on AWS Lambda, proceed to the next section.
Or, jump to the section Add Endpoint to Dialogflow.
AWS Lambda is a serverless hosting solution by Amazon. Many Alexa Skills are hosted on this platform, thus it might make sense for you to host your cross-platform voice application (including your Google Action). This is what we’re going to do in this section. This usually takes a few steps, so be prepared. If you only want to get an output for the first time, go back up to Local Prototyping.
In the next steps, we are going to create a new Lambda function on the AWS Developer Console.
Go to aws.amazon.com and log into your account (or create a new one):
Go to the AWS Management Console:
Search for “lambda” or go directly to console.aws.amazon.com/lambda:
Click “Create a Lambda function”, choose “Author from scratch” and fill out the form:
You can either choose an existing role (if you have one already), or create a new one. We’re going to create one from a template and call it “myNewRole” with no special policy templates:
Now it’s time to configure your Lambda function. Let’s start by adding the Alexa Skills Kit as a trigger:
You can enable skill ID verification, if you want, but it’s not neccessary.
Now let’s get to the fun part. You can either enter to code inline, upload a zip, or upload a file from Amazon S3. As we’re using other dependencies like the jovo-framework npm package, we can’t use the inline editor. We’re going to zip our project and upload it to the function.
To upload the code to Lambda, please make sure to zip the actual files inside the directory, not the HelloWorld folder itself:
Let’s go back to the AWS Developer Console and upload the zip:
Now save your changes with the orange button in the upper right corner:
Great! Your Lambda function is now created. Click “Test” right next to the “Save” button and select “Alexa Start Session” as the event template, since the Jovo Framework supports both Google Action and Amazon Alexa requests:
Click “Test,” aaand 🎉 it works!
If you want to test it with a “real” Google Assistant request, you can also copy-paste this one:
For Alexa Skills, you can just use the Lambda function’s ARN to proceed, for Dialogflow, we need to create an API Gateway.
Go to console.aws.amazon.com/apigateway to get started:
Let’s create a new API called “myGoogleActionAPIGateway” (you can call it whatever, though):
After successful creation, you will see the Resources screen. Click on the “Actions” dropdown and select “New Method”:
Dialogflow needs a webhook where it can send POST requests to. So let’s create a POST method that is integrated with our existing Lambda function:
Grant it permission:
And that’s almost it. You only need to deploy the API like this:
And create a new stage:
Yes! Finally, you can get the URL for the API Gateway from here:
There’s one more step we need to do before testing: we need to use this link and add it to Dialogflow.
Now that have either our local webhook or the API Gateway to AWS Lambda set up, it’s time use the provided URL to connect our application with our agent on Dialogflow.
Go back to the Dialogflow console and choose the Fulfillment navigation item. Enable the webhook and paste either your Jovo webhook URL or the API Gateway:
Dialogflow offers the ability to customize your language model in a way that you can choose for every intent how it’s going to be handled.
This means we need to enable webhook fulfillment for every intent we use in our model.
Go to HelloWorldIntent first and check “Use webhook” in at the bottom of the page:
Do the same for the “MyNameIsIntent” and also take a look at the “Default Welcome Intent” and don’t forget to check the box there as well. The intent comes with default text responses, which would otherwise cause random output instead of your model, when the application is launched.
Great! Now let’s test your Action.
The work is done. It’s now time to see if Google Assistant is returning the “Hello World!” we’ve been awaiting for so long. There are several options to test our Google Action:
For quick testing of your language model and to see if your webhook works, you can use the internal testing tool of Dialogflow.
You can find it to the right. Just type in the expression you want to test (in our case “my name is jan”) and it returns your application’s response and some other information (like the intent):
Testing with Dialogflow will often be enough (and especially useful, as other tools can sometimes be a bit buggy). However, it doesn’t test the integration between Dialogflow and Google Assistant. For this, you need to use the Actions on Google Simulator (see next step).
Now, let’s make our Dialogflow agent work with Google Assistant. Open the Integrations panel from the sidebar menu:
Here, choose the “Actions on Google” integration:
Click “Test” and, on the success screen, “Continue”:
In the Simulator, you can now test your Action:
Yeah! Your application is now an Action on Google Assistant.
The Simulator can be unreliable sometimes. For example, there are a few things that could make an error message show up: “Sorry, this action is not available in simulation”
There are several things that could be useful for troubleshooting:
It can also be helpful to go through the process one more time. Go to Integrations on Dialogflow, choose the Actions on Google integration, and click on “Test”:
Let us know in the comments if it worked!
If you want to test your Action on a Google Home (or other device that works with Google Assistant), make sure you’re connected to it with the same Google account you’re using for the Simulator (and that testing is enabled, see previous step).
Then, use the invocation that was provided by the Simulator: