In this Google Action¬†tutorial for beginners, you will learn how to build an Action for Google Assistant (the voice assistant living inside Google Home) from scratch. We will cover the essentials of building an app for Google Assistant, how to set everything up on API.AI and the Actions on Google Console, and how to use¬†Jovo¬†to build your Action’s¬†logic.

Beginner Tutorial: Build a Google Action in Node.js

See also: Build an Alexa Skill in Node.js with Jovo

What you’ll learn

About Jovo:¬†Jovo is an open source Node.js development framework for voice applications for both Amazon Alexa and Google Assistant. Check out the¬†GitHub repository¬†or the¬†documentation, if you’re interested in learning more.

What We’re Building

To get you started as quickly as possible, we’re going to create a simple Action¬†that responds with “Hello World!”

Please note: This is a tutorial for beginners and explains the essential steps of Google Action development in detail. If you already have experience with Google Home or Google Assistant and just want to learn more about how to use Jovo, either skip the first few sections and go right to Code the Skill, or take a look at the Jovo Documentation.

.

1) How do Google Actions Work?

In this section, you will learn more about the architecture of Google Assistant¬†and how users interact with its Actions. First, let’s take a look at the wording of the different kinds of software and hardware that’s involved:

The difference between Google Home, Google Assistant, and Google Actions

While¬†it’s the hardware device that most users see,¬†Google Home¬†is not the name of the assistant you can develop actions for (wich sometimes causes confusion when people talk about “building an Action for Google Home“). The artificial intelligence you can hear speaking from¬†inside the smart speaker is called¬†Google Assistant¬†(which is now also available on Android and iOS smartphones).¬†Actions on Google¬†are the applications that can be built on top of the Google Assistant platform.

The main difference between the architecture of building Google Actions and¬†Alexa Skills¬†is that for Google you need an additional layer to handle the natural language. Most Action¬†developers use¬†API.AI¬†to configure their application’s¬†language model:

Google Assistant and API.AI integration

We will take a deeper look into API.AI in section 2: Create an Agent on API.AI.

To understand how Google Actions work, let’s take a look at the two important elements:

a) User Input

There are a few steps that happen before a user’s speech input is reaching your Action. The voice input process (from left to right) consists of three stages that happen at three (four, if you count Google and API.AI as two) different places:

Google Assistant Speech Input Request

  1. A user talking to an¬†Assistant¬†device¬†like Google Home¬†(speech input), which is passed to…
  2. the¬†Actions¬†API¬†which uses its¬†API.AI integration¬†to understand¬†what the user wants (through natural language understanding), and creates a request, which is passed to…
  3. your Action code which knows what to do with the request.

b) Assistant Output

The voice output process (from right to left) goes back and passes the stages again:

Google Assistant Output Response

  1. Your¬†Action¬†code¬†now turns the input into a desired output and returns a response to…
  2. ¬†the¬†Assistant API¬†(through¬†API.AI), which turns this response into speech via text-to-speech, sending sound output to…
  3. the Assistant device, where your user is happily waiting and listening

In order to make the Action¬†work, we need to configure it on both¬†API.AI¬†(for the natural language understanding) and the¬†Actions on Google Console¬†(for the Assistant integration).¬†¬†We’re going to create a new API.AI Agent in the next step.

2) Create an Agent on API.AI

An¬†API.AI agent¬†offers a set of modules and integrations to add natural language understanding (NLU) to your product. Although it’s owned by Google, it’s platform agnostic and¬†works for other channels like Facebook Messenger, as well.

We’re going to add our own¬†agent now. Let’s get started:

a) Log in with your Google Account

Go to¬†api.ai¬†and click “Go to console” on the upper right:

API.AI Website

Now sign in with your Google account. To simplify things, make sure to use the same account that’s registered with your Actions on Google¬†enabled device like Google Home (if possible) for more seamless testing.

Sign into API.AI with your Google account

b) Create a New Agent

Great! Once you’re in the console, click “create agent”:

Create a new API.AI agent

We’re just going to name it “HelloWorldAgent” and leave the other information out for now:

Create a HelloWorldAgent on API.AI

After creating the agent, you can see the screen Intents:

List of intents of API.AI Agent: Default Fallback Intent and Default Welcome Intent

These intents are part of the Agent’s language model. Let’s take a deeper look into how it works in the next section.

 

3) Create a Language Model

API.AI offers an easy (but also highly customizable) way to create a language model for your Google Action.

Let’s take a look:

a) An Introduction to API.AI Interaction Models

Google Assistant and API.AI¬†help you with several steps in processing input. First, they¬†take a user’s speech and transform it into written text (speech to text). Afterward, they¬†use a language model to make sense out of what the user means (natural language understanding).

A simple interaction model for Google Assistant (built with API.AI) consists of three elements: Intents, user expressions, and entities.

Intents: What the User Wants

An intent is something a user wants to achieve while talking to your product. It is the basic meaning that can be stripped away from the sentence or phrase the user is telling you. And there can be several ways to end up at that specific intent.

FindRestaurantIntent and User Expressions

For example, a FindRestaurantIntent from the image above could have different ways how users could express it. In the case of API.AI language models, these are called user expressions:

User Expressions: What the User Says

An user expression (sometimes called utterance) is the actual sentence a user is saying. There are often a large variety of expressions that fit into the same intent. And sometimes it can even be a little more variable. This is when entities come into play:

Entities

No matter if I’m looking for a super cheap place, a pizza spot that serves Pabst Blue Ribbon, or a dinner restaurant to bring a date, generally speaking it serves one purpose (user intent): to find a restaurant. However, the user is passing some more specific information that can be used for a better user experience. These are called¬†entities:

FindRestaurantIntent with User Expressions and Entities

These are just the very basic components for you to get introduced to some terms.¬†We don’t need to know a lot about¬†entities¬†for this simple tutorial. However, it’s good to know for later steps.

Now that we know a little bit about how language models work, let’s create our first intent that’s being used to return a “Hello World!”

b) Create a New Intent: HelloWorldIntent

After creating the agent, you can see that there are two standard intents already in place. We’re going to keep them. The “Default Welcome Intent” will later be mapped to the Jovo¬†“LAUNCH”¬†intent.

HelloWorldAgent: List of Intents

Let’s create another intent and name it “HelloWorldIntent”:

Create HelloWorldIntent on API.AI

Now let’s add some example phrases to the intent, so that API.AI knows what people are saying when they want to get a “Hello World!” response. We’re going to add two user expressions to “User says“:

HelloWorldIntent User Expressions

Save the intent. This should be enough of a language model for now. We’re going to return to API.AI later to connect our application (the code) to our agent.

Now, let’s look at the code!

 

4)¬†Build Your Action’s Code

Now let’s build the logic of our¬†Google Action.

We’re going to use our¬†Jovo Framework¬†which works for both Alexa Skills and Actions on Google Home.

a) Install the Jovo CLI

The Jovo Command Line Tools (see the GitHub repository) offer a great starting point for your voice application, as it makes it easy to create new projects from templates.

$ npm install -g jovo-cli

This should be downloaded and installed now (see our documentation for more information like technical requirements). After the installation, you can test if everything worked with the following command:

$ jovo

This should look like this:

Jovo CLI in Terminal

b) Create a new Project

Let’s create a new project. You can see from the screenshot¬†above that it’s possible to create new projects with this command (the “helloworld” template is the default template and will clone our¬†Jovo Sample App¬†into the specified directory):

$ jovo new HelloWorld

Create new Jovo Project in Terminal

c) A First Look at the index.js

Let’s take a look at the code provided by the sample application. For now, you only have to touch the¬†index.js¬†file. This is where all the configurations and app logic will happen. The Jovo Architecture (see¬†the docs) looks like this:

Jovo index.js structure: App Configuration and Ap Logic

Let’s take a look at the lower part first:

d) Understanding the App Logic

The handlers variable is where you will spend most of your time when you’re building the logic behind your Google¬†Action. It already has a “HelloWorldIntent,” as you can see below:

let handlers = {
    'LAUNCH' : function () {
    	// this intent is triggered when people open the voice app 
    	// without a specific deep link into an intent
        app.toIntent('HelloWorldIntent');
    },

    'HelloWorldIntent': function() {
        app.tell('Hello World!');
    },

};

What’s happening here? When your skill is opened, it triggers the¬†LAUNCH-intent, which contains a¬†toIntent¬†call to switch to the HelloWorldIntent. Here, the¬†tell¬†method is called to respond to your users with “Hello World!”

That’s it for now. Of course, feel free to modify this as you wish. To create more complex Google Actions, take a look at the framework’s capabilities¬†here: Jovo Framework Docs: Building a Voice App.

 

5) App Configuration: Where to Run Your Code

So where do we send the response to? Let’s switch tabs once again and take a look at the Fulfillment section at API.AI:

API.AI Webhook Fulfillment

To make a connection between the API.AI and your application, you need to an HTTPS endpoint (a webhook).

Jovo currently supports an Express server and AWS Lambda. We recommend the first one for local prototyping, but you can also jump to the Lambda section.

 

a) App Configuration: Local Prototyping with Express and Ngrok

The index.js comes with off-the-shelf server support so that you can start developing locally as easy as possible.

You can find this part in the App Configuration building block:

const app = require('jovo-framework').Jovo;
const webhook = require('jovo-framework').Webhook;

// Listen for post requests
webhook.listen(3000, function() {
    console.log('Local development server listening on port 3000.');
});

webhook.post('/webhook', function(req, res) {
    app.handleRequest(req, res, handlers);
    app.execute();
});

// App Logic below

Run Local Server

Let’s try that out with the following command (make sure to go into the project directory first):

$ node index.js

This will start the express server and look like this:

Run local node server with Jovo

Create ngrok Endpoint

So now, how does API.AI reach that endpoint? It’s currently running locally, so it’s not accessible for outside APIs. Fortunately, there is a helpful tool like ngrok.

Ngrok is a tunneling service that points to your localhost and creates a subdomain that you can then submit to the API.AI Console.

In your command line, open a new tab and type in the following command to install ngrok:

$ npm install ngrok -g

Now, you should be able to create a secure tunnel to your localhost:3000 like this:

$ ngrok http 3000

If it works, use the https link provided by ngrok here:

Local prototyping for your Alexa Skill with ngrok

This should be enough for now to test and debug your Google Action locally. If you want to learn more about how to make it work on AWS Lambda, proceed to the next section.

Or, jump to the section Add Endpoint to API.AI.

 

b) App Configuration: Host your Code on AWS Lambda

AWS Lambda¬†is a serverless hosting solution by Amazon. Many Alexa Skills are hosted on this platform, thus it might make¬†sense for you to host your cross-platform voice application (including your Google Action). This is what we’re going to do in this section.¬†This usually takes a few steps, so be prepared. If you only want to get an output for the first time, go back up to¬†Local Prototyping.

For Lambda support, the app configuration looks different compared to the server solution. To get started, open the file index_lambda.js in your project directory and rename it to index.js.

Or, just swap the configuration part in the index.js file. This is what the configuration looks like:

const app = require('jovo-framework').Jovo;

exports.handler = function(event, context, callback) {
    app.handleRequest(event, callback, handlers);
    app.execute();
};

// App Logic below

In the next steps, we are going to create a new Lambda function on the AWS Developer Console.

Create a Lambda Function

Go to aws.amazon.com and log into your account (or create a new one):

AWS Portal

Go to the AWS Management Console:

AWS Services

Search for “lambda” or go directly to¬†console.aws.amazon.com/lambda:

AWS Lambda Functions

Click “Create a Lambda function” and choose “Blank Function” from the selection:

AWS Lambda Blueprints

As a trigger, choose “Alexa Skills Kit.” This way, your code will later also work with Alexa skills. We are going to create an API Gateway after creating the Lambda function.

AWS Lambda: Configure Triggers

Now, you can configure your Lambda function. We’re just going to name it “myGoogleAction”:

AWS Lambda: Create myGoogleAction

Upload Your Code

Now let’s get to the fun part. You can either enter to code inline, upload a zip, or upload a file from Amazon S3.¬†¬†As we’re using other dependencies like the¬†jovo-framework npm package, we can’t use the inline editor. We’re going to zip our project¬†and upload it to the function.

Let’s take a look at the project directory (note: index,js was renamed to index_webhook.js in order to rename index_lambda.js to index.js):

Jovo Project files in Mac Finder

To upload the code to Lambda, please make sure to zip the actual files inside the directory, not the HelloWorld folder itself:

Select and zip all files in the folder

Let’s go back to the AWS Developer Console and upload the zip:

Lambda Function: Upload ZIP

Now scroll down to the next step:

Lambda Function Handler and Role

For the Lambda Function Handler, use index.handler.

Lambda function handler and role

You can either choose an existing role (if you have one already), or create a new one. We’re going to create one from a template and call it “mySkillRole” with no special policy templates:

AWS Lambda: Create myActionRole

Click “Next” to proceed to the next step. In the “Review” process, click “Create function” to the lower right corner:

AWS Lambda Function: Review

Test Your Lambda Function

Great! Your Lambda function is now created. Click “Test” to see if it works:

AWS Lambda Function: Test

The beautiful thing about the Jovo Framework¬†is that it works for both Google Assistant and Amazon Alexa. So, for this input test event, you can just use the “Alexa Start Session” template and it works. Look for the green checkmark at bottom of the page:

AWS Lambda Function: Test Success

If you want to test it with a “real” Google Assistant request, you can also copy-paste this one:

{
	"originalRequest": {
		"source": "google",
		"version": "2",
		"data": {
			"isInSandbox": true,
			"surface": {
				"capabilities": [
					{
						"name": "actions.capability.AUDIO_OUTPUT"
					},
					{
						"name": "actions.capability.SCREEN_OUTPUT"
					}
				]
			},
			"inputs": [
				{
					"rawInputs": [
						{
							"query": "talk to my test app",
							"inputType": "KEYBOARD"
						}
					],
					"intent": "actions.intent.MAIN"
				}
			],
			"user": {
				"locale": "en-US",
				"userId": "1501754379730"
			},
			"device": {},
			"conversation": {
				"conversationId": "1501754379730",
				"type": "NEW"
			}
		}
	},
	"id": "ce231a64-af08-4c33-bfa3-0724a80d5b2c",
	"timestamp": "2017-08-03T09:59:39.741Z",
	"lang": "en",
	"result": {
		"source": "agent",
		"resolvedQuery": "GOOGLE_ASSISTANT_WELCOME",
		"speech": "",
		"action": "input.welcome",
		"actionIncomplete": false,
		"parameters": {},
		"contexts": [
			{
				"name": "google_assistant_welcome",
				"parameters": {},
				"lifespan": 0
			},
			{
				"name": "actions_capability_screen_output",
				"parameters": {},
				"lifespan": 0
			},
			{
				"name": "actions_capability_audio_output",
				"parameters": {},
				"lifespan": 0
			},
			{
				"name": "google_assistant_input_type_keyboard",
				"parameters": {},
				"lifespan": 0
			}
		],
		"metadata": {
			"intentId": "b0b7962c-cae0-4437-bddf-e72f457959d6",
			"webhookUsed": "true",
			"webhookForSlotFillingUsed": "false",
			"nluResponseTime": 2,
			"intentName": "Default Welcome Intent"
		},
		"fulfillment": {
			"speech": "Greetings!",
			"messages": [
				{
					"type": 0,
					"speech": "Hi!"
				}
			]
		},
		"score": 1
	},
	"status": {
		"code": 200,
		"errorType": "success"
	},
	"sessionId": "1501754379730"
}

Create API Gateway

For Alexa Skills, you can just use¬†the Lambda function’s ARN to proceed, for API.AI, we need to create an API Gateway.

Go to console.aws.amazon.com/apigateway to get started:

Amazon API Gateway Website

Let’s create a new API called “myGoogleActionAPIGateway” (you can call it whatever, though):

Create myGoogleActionAPIGateway

After successful creation, you will see the Resources screen. Click on the “Actions” dropdown and select “New Method”:

API Gateway: New Method

API.AI needs a webhook where it can send POST requests to. So let’s create a POST method that is integrated with our existing Lambda function:

API Gateway: Create POST Method

Grant it permission:

API Gateway: Lambda Function Permission

And that’s almost it. You only need to deploy the API like this:

API Gateway: Deploy API

And create a new stage:

API Gateway: Deployment stage

Yes! Finally, you can get the URL for the API Gateway from here:

API Gateway: Invoke URL

There’s one more step we need to do before testing: we need to use this link and add it to API.AI.

 

6) Add Endpoint to API.AI

Now that have either our local webhook or the API Gateway to AWS Lambda set up, it’s time use the provided URL to connect our application with our agent on API.AI.

a) Agent Fulfillment Section

Go back to the API.AI console and choose the Fulfillment navigation item. Enable the webhook and paste either your ngrok URL or the API Gateway:

API.AI Webhook Fulfillment with URL

b) Add Webhook to Intents

API.AI offers the ability to customize your language model in a way that you can choose for every intent how it’s going to be handled.

This means we need to enable webhook fulfillment for every intent we use in our model.

Go to HelloWorldIntent first and check “Use webhook” in at the bottom of the page:

API.AI add webhook fulfillment to HelloWorldIntent

Important: Also take a look at the “Default Welcome Intent” and don’t forget to check the box there as well. The intent comes with default text responses, which would otherwise cause random output instead of your model, when the application is launched.

API.AI add webhook fulfillment to Default Welcome Intent

Great! Now let’s¬†test your Action.

 

7) “Hello World!”

The work is done. It’s now time to see if Google Assistant is returning the “Hello World!” we’ve been awaiting for so long. There are several options to test our Google Action:

a) Test in API.AI

For quick testing of your language model and to see if your webhook works, you can use the internal testing tool of API.AI.

You can find it to the right. Just type in the expression you want to test (in our case “say hi”) and it returns your application’s response and some other information (like the intent):

API.AI Internal Testing Tool

Testing with API.AI will often be enough (and especially useful, as other tools can sometimes be a bit buggy). However, it doesn’t test the integration between API.AI and Google Assistant. For this, you need to use the Actions on Google Simulator (see next step).

 

b) Test in the Actions on Google Simulator

Now, let’s make our API.AI agent work with Google Assistant. Open the Integrations panel from the sidebar menu:

API.AI Integrations

Here, choose the “Actions on Google” integration:

API.AI Actions on Google Integration

Click “Update” and, on the success screen, “Visit Console”:

API.AI: Assistant app successfully updated

In the Actions on Google console, go to the Simulator and sign in:

Actions on Google Simulator: Sign in

In the Simulator, you can now test your Action:

Actions on Google Simulator

Sometimes, testing with the Simulator can be a bit tedious, so make sure that you use the sample phrase that’s suggested by the input field:

Actions on Google: Talk to my test app

And, if everything is set up correctly, it should return something like this:

Actions on Google Simulator: Hello World

Yeah! Your application is now an Action on Google Assistant.

Troubleshooting

The Simulator can be unreliable sometimes. For example, there are a few things that could make an error message show up: “Sorry, this action is not available in simulation”

Sorry, this action is not available in simulation

There are several things that could be useful for troubleshooting:

  • Use the right invocation name (as noted above)
  • Make sure you’re¬†using the same Google account¬†for logging into API.AI and the Actions on Google console
  • If you have more Actions projects, disable all others for testing
  • Turn on¬†Voice & Audio Activity, Web & App Activity, and Device Information permissions for your Google Account here: Activity controls

It can also be helpful to go through the process one more time. Go to¬†Integrations on API.AI, choose the Actions on Google integration, and click on “Test”:

 

API.AI: Actions on Google integration Test

In the green field, click on “view” to go to the Actions on Google Simulator. Let us know in the comments if it worked!

c) Test on your Assistant enabled device

If you want to¬†test your Action on a Google Home (or other device that works with Google Assistant), make sure you’re connected to it with the same Google account you’re using for the Simulator (and that testing is enabled, see previous step).

Then, use the invocation that was provided by the Simulator:

OK Google, talk to my test app

 

Next Steps

Great job! You’ve gone through all the necessary steps to prototype your own Google Action. The next challenge is to build a real Action. For this, take a look at the Jovo Documentation to see what else you can do with our Framework:

Jovo Documentation for Alexa Skills and Google Actions

Stay up to date

Get news and free voice develpment resources in your inbox. No spam.

No spam. You can find previous editions here.

Comments

Any specific questions? Just drop them below. Alternatively, you can find other channels to reach us here. Thank you!