In this Alexa Skill tutorial for beginners, you will learn how to build a project for the popular voice platform from scratch. We will cover the essentials of building an app for Alexa, how to set everything up on the Amazon Developer Portal, and how to use Jovo to build your Skill’s logic.

Beginner Tutorial: Build an Alexa Skill in Node.JS

See also: Build a Google Action in Node.js with Jovo

What you’ll learn

About Jovo: Jovo is an open source Node.js development framework for voice applications for both Amazon Alexa and Google Assistant. Check out the GitHub repository or the documentation, if you’re interested in learning more.

What We’re Building

To get you started as quickly as possible, we’re going to create a simple Skill that responds with “Hello World!”

Please note: This is a tutorial for beginners and explains the essential steps of Alexa Skill development in detail. If you already have experience with Alexa and just want to learn more about how to use Jovo, either skip the first few sections and go right to Code the Skill, or take a look at the Jovo Documentation.

 

1) How do Alexa Skills Work?

In this section, you will learn more about the architecture of Alexa and how users interact with its Skills. An Alexa Skill interaction basically consists of speech input (your user’s request) and output (your Skill’s response).

There are a few steps that happen before a user’s speech input is reaching your Skill. The voice input process (from left to right) consists of three stages that happen at three different places:

Alexa Input Process: Alexa enabled device passes speech input to the Alexa API which passes a request to your Skill Code

  1. A user talking to an Alexa enabled device (speech input), which is passed to…
  2. the Alexa API which understands what the user wants (through natural language understanding), and creates a request, which is passed to…
  3. your Skill code which knows what to do with the request.

The third stage is where your magic is happening. The voice output process (from right to left) goes back and passes the stages again:

Alexa Output Process: You return a request to the Alexa API which creates a speech output and passes it to the Alexa enabled device

  1. Your Skill code now turns the input into a desired output and returns a response to…
  2. the Alexa API, which turns this response into speech via text-to-speech, sending sound output to…
  3. the Alexa enabled device, where your user is happily waiting and listening

In order to make the Skill work, we first need to configure it, so that the Alexa API knows which data to pass to your application (and where to pass it). We will do this on the Amazon Developer Portal.

 

2) Create a Skill on the Amazon Developer Portal

The Amazon Developer Portal is the console where you can add your Skill as a project, configure the language model, test if it’s working, and publish it to the Alexa Skill Store.

Let’s get started:

a) Log in with your Amazon Developer Account

Go to developer.amazon.com and click “Sign in” on the upper right:

Amazon Developer Portal

Now either sign in with your Amazon Developer account or create a new one. To simplify things, make sure to use the same account that’s registered with your Alexa enabled device (if possible) for more seamless testing.

Amazon Developer Portal: Sign in ScreenGreat! You should now have access to your account. This is what your dashboard of the Amazon Developer Console looks like:

Amazon Developer Console: Dashboard

b) Create a new Skill

Now it’s time to create a new project on the developer console. Click on the “Alexa” menu item in the navigation bar and choose “Alexa Skill Kit” to access your Alexa Skills:

Amazon Developer Console: Alexa Section

Let’s create a new Skill by clicking on the yellow button to the upper right:

Amazon Developer Console: List of Alexa Skills

This is what an “empty” Skill looks like:

Amazon Developer Console: Create a new Alexa Skill

In the following steps, we will configure our Skill and create a language model that works with Alexa.

c) Alexa Skill Configuration

Let’s break this down step by step.

Interaction Model

There are different types of Skills available for Amazon Alexa. In this example, we will use the “Custom Interaction Model.”

Alexa Skill Types: Custom Interaction Model, Smart Home Skill, Flash Briefing Skill, Video Skill

Skill Language

Currently, Alexa is available in the US, UK, and Germany. A Skill can have more than one language (although you have to configure all the following steps again). Make sure to use the language that is also associated to the Amazon account that is linked to your Alexa enabled device, so you can test it without any problems. In our case, it will be English (U.S.) for the United States:

Alexa Skill Language

Skill Name and Invocation Name

This is where it gets a little more interesting. What should your Skill’s name be?

Alexa Skill: Name and Invocation Name

There are two types of names for your Alexa Skill: While the first, the Name is the one people can see in their Alexa app and the Alexa Skill Store, the Invocation Name is the one that is used by your users to access your Skill:

How an Alexa Invocation Name Works

Make sure to choose an invocation name that can be understood by Alexa. For this simple tutorial, we can just go with My First Skill (name) and My Skill (invocation name):

Your first Alexa Skill

Global Fields

Alexa offers support for more and more special types of Skills, e.g. ones that make use of the Echo Show’s visual component. In our case, we won’t make use of them for now:

Global Fields: Audio Player, Video App, Render Template for your Skill

The Skill Information is done. Let’s move on to the next step, where the language model of your Skill will be created.

 

3) Create a Language Model

You need to define an interaction model, in order to have Alexa tell your application what your user wants. The screen looks like this:

Alexa Interaction Model

But first, let’s take a look at how natural language understanding (NLU) with Alexa works.

a) An Introduction to Alexa Interaction Models

Alexa helps you with several steps in processing input. First, it takes a user’s speech and transforms it into written text (speech to text). Afterward, it uses a language model to make sense out of what the user means (natural language understanding).

A simple interaction model for Alexa consists of three elements: Intents, utterances, and slots.

Intents

An intent is something a user wants to achieve while talking to your product. It is the basic meaning that can be stripped away from the sentence or phrase the user is telling you. And there can be several ways to end up at that specific intent.

FindRestaurantIntent with three example utterances

For example, a FindRestaurantIntent from the image above could have different ways how users could express it. In the case of Alexa language models, these are called utterances:

Utterances

An utterance (sometimes called user expression) is the actual sentence a user is saying. There are often a large variety of utterances that fit into the same intent. And sometimes it can even be a little more variable. This is when slots come into play:

Slots

No matter if I’m looking for a super cheap place, a pizza spot that serves Pabst Blue Ribbon, or a dinner restaurant to bring a date, generally speaking it serves one purpose (user intent): to find a restaurant. However, the user is passing some more specific information that can be used for a better user experience. These are called slots:

FindRestaurantIntent with utterances and slots

These are just the very basic components, and we don’t need to know about slots for this simple tutorial. However, it’s good to know for later steps.

b) Create a HelloWorldIntent

For a simple “Hello World,” we only need to create one intent and add a few sample utterances. So let’s dive into the Amazon Developer Console and do this. There are currently two possibilities to add intents, utterances, and slots to your Alexa language model: the Skill Builder Beta, or the Old Interaction Schema.

You can either use the old schema in the next section, or jump the the Skill Builder.

Old Interaction Schema

For the old interaction schema, just stay in the window where you left off:

Interaction Model: Old Editor

There are three things that can be added to the interaction model: Intent Schema, Custom Slot Types, and Sample Utterances.

Let’s start with the intent schema:

Alexa Skill Intent Schema

This is a JSON representation of your intents. They can be added like this:

{
  "intents": [
    { "intent": "YourFirstIntent" },
    { "intent": "YourSecondIntent" }
  ]
}

For our demo, let’s just add a HelloWorldIntent and a few that are required by Amazon in order to pass the Skill certification. We don’t need to implement them just yet. The schema looks like this:

{
  "intents": [
    { "intent": "HelloWorldIntent" },
    { "intent": "AMAZON.HelpIntent" },
    { "intent": "AMAZON.StopIntent" },
    { "intent": "AMAZON.CancelIntent" }
  ]
}

 

In the next section, custom slot types can be added. As we only want to get a “Hello World” in the first step, we can skip this for now.

Alexa Skill Custom Slot Types

Now let’s talk about sample utterances:

Alexa Skill Sample Utterances

They can be added like this

IntentName this is your first utterance
IntentName this is your second utterance

For our sample Skill, we are going to use the following:

HelloWorldIntent say hello
HelloWorldIntent say hi

Great! Let’s click “save” and let the Alexa API build the interaction model.

Alternatively, you can also use the Skill Builder Beta (or jump to Code the Skill right away):

 

Skill Builder Beta

The Skill Builder is a new interface for creating the interaction model with a few more features. Click on this button in the Interaction Model view to access it:

Launch Skill Builder

It looks like this:

Alexa Skill Builder: Dashboard

As you can see on the left, the three intents for users to cancel, stop, or ask for help, are already required by Amazon.

Click on the “Add an Intent +” button to add our “HelloWorldIntent”:

Alexa Skill Builder: Add Intent

As you can see, this looks already different. Also, utterances are now submitted line by line:

Alexa Skill Builder: Add utterances

We end up with the same two utterances for our HelloWorldIntent:

Sample Utterances for HelloWorldIntent

By the way, there’s also the possibility to add intens, utterances, and slots in the Code Editor. You can access it in the left sidebar:

Alexa Skill Builder: Code Editor

Great! Let’s click on “Build Model” at the top. It will take a while for Amazon to build the interaction model.

In the meantime, let’s look at the code!

 

4) Build Your Skill’s Code

Now let’s build the logic of our Alexa Skill.

We’re going to use our Jovo Framework which works for both Alexa Skills and Actions on Google Home.

a) Install the Jovo CLI

The Jovo Command Line Tools (see the GitHub repository) offer a great starting point for your voice application, as it makes it easy to create new projects from templates.

$ npm install -g jovo-cli

This should be downloaded and installed now (see our documentation for more information like technical requirements). After the installation, you can test if everything worked with the following command:

$ jovo

This should look like this:

Jovo CLI in Terminal

b) Create a new Project

Let’s create a new project. You can see from the feature above that it’s possible to create new projects with this command (the “helloworld” template is the default template and will clone our Jovo Sample App into the specified directory):

$ jovo new HelloWorld

Create new Jovo Project in Terminal

c) A First Look at the index.js

Let’s take a look at the code provided by the sample application. For now, you only have to touch the index.js file. This is where all the configurations and app logic will happen. The Jovo Architecture (take a look at the docs) looks like this:

Jovo index.js structure: App Configuration and Ap Logic

Let’s take a look at the lower part first:

d) Understanding the App Logic

The handlers variable is where you will spend most of your time when you’re building the logic behind your Alexa Skill. It already has a “HelloWorldIntent,” as you can see below:

let handlers = {
    'LAUNCH' : function () {
    	// this intent is triggered when people open the voice app 
    	// without a specific deep link into an intent
        app.toIntent('HelloWorldIntent');
    },

    'HelloWorldIntent': function() {
        app.tell('Hello World!');
    },

};

What’s happening here? When your skill is opened, it triggers the LAUNCH-intent, which contains a toIntent call to switch to the HelloWorldIntent. Here, the tell method is called to respond to your users with “Hello World!”

 

6) App Configuration: Where to Run Your Code

So where do we send the response to? Let’s switch tabs once again and take a look at the Amazon Developer Console, this time the Configuration step:

Alexa Skill Configuration

To make a connection between the Alexa API and your application, you need to either upload your code to AWS Lambda, or provide an HTTPS endpoint (a webhook).

Jovo supports both. For local prototyping and debugging, we recommend using HTTPS (which we are going to describe in the next step), but you can also jump to the Lambda section.

a) App Configuration: Local Prototyping with Express and Ngrok

The index.js comes with off-the-shelf server support so that you can start developing locally as easy as possible.

You can find this part in the App Configuration building block:

const app = require('jovo-framework').Jovo;
const webhook = require('jovo-framework').Webhook;

// Listen for post requests
webhook.listen(3000, function() {
    console.log('Local development server listening on port 3000.');
});

webhook.post('/webhook', function(req, res) {
    app.handleRequest(req, res, handlers);
    app.execute();
});

// App Logic below

Let’s try that out with the following command (make sure to go into the project directory first):

$ node index.js

This will start the express server and look like this:

Run local node server with Jovo

So now, how does the Alexa API reach that endpoint? It’s currently running locally, so it’s not accessible for outside APIs. Fortunately, there is a helpful tool like ngrok.

Ngrok is a tunneling service that points to your localhost and creates a subdomain that you can then submit to the Amazon Developer Console.

In your command line, open a new tab and type in the following command to install ngrok:

$ npm install ngrok -g

Now, you should be able to create a secure tunnel to your localhost:3000 like this:

$ ngrok http 3000

If it works, use the https link provided by ngrok here:

Local prototyping for your Alexa Skill with ngrok

Append “/webhook” to the link and paste the link into the field of the Amazon Developer Console, like this:

Alexa Skill HTTPS Endpoint

For HTTPS endpoints, Amazon also needs to know if it’s a secure connection. In the next step, choose the second option (the link ngrok provides you is a secure subdomain):

Alexa Skill SSL Certificate

Great! Your voice app is now running locally and ready to test. If you’re interested in how to set up Lambda, read further. If you want to dive right into the testing, jump to “Hello World!”

 

b) App Configuration: Host your Code on AWS Lambda

AWS Lambda is a serverless hosting solution by Amazon. Many Skills are hosted on this platform, as it is a cheap alternative to other hosting providers, and also Amazon offers additional credits for Alexa Skill developers. In this section, you’ll learn how to host your Skill on Lambda. This usually takes a few steps, so be prepared. If you only want to get an output for the first time, go back up to Local Prototyping.

For Lambda support the app configuration looks different compared to the webserver solution. To get started, open the file index_lambda.js in your project directory and rename it to index.js.

Or, just swap the configuration part in the index.js file. This is what the configuration looks like:

const app = require('jovo-framework').Jovo;

exports.handler = function(event, context, callback) {
    app.handleRequest(event, callback, handlers);
    app.execute();
};

// App Logic below

In the next steps, we are going to create a new Lambda function on the AWS Developer Console.

Create a Lambda Function

Go to aws.amazon.com and log into your account (or create a new one):

AWS Portal

Go to the AWS Management Console:

AWS Services

Search for “lambda” or go directly to console.aws.amazon.com/lambda:

AWS Lambda Functions

Click “Create a Lambda function” and choose “Blank Function” from the selection:

AWS Lambda Blueprints

As a trigger, choose “Alexa Skills Kit”

AWS Lambda: Configure Triggers

Now, you can configure your Lambda function:

AWS Lambda: Configure function

Upload Your Code

Now let’s get to the fun part. You can either enter to code inline, upload a zip, or upload a file from Amazon S3.  As we’re using other dependencies like the jovo-framework npm package, we can’t use the inline editor. We’re going to zip our project and upload it to the function.

Let’s take a look at the project directory (note: index,js was renamed to index_webhook.js in order to rename index_lambda.js to index.js):

Jovo Project files in Mac Finder

To upload the code to Lambda, please make sure to zip the actual files inside the directory, not the HelloWorld folder itself:

Select and zip all files in the folder

Let’s go back to the AWS Developer Console and upload the zip:

Lambda Function: Upload ZIP

Now scroll down to the next step:

Lambda Function Handler and Role

For the Lambda Function Handler, use index.handler.

Lambda function handler and role

You can either choose an existing role (if you have one already), or create a new one. We’re going to create one from a template and call it “mySkillRole” with no special policy templates:

Lambda Function: Create new Role

Click “Next” to proceed to the next step. In the “Review” process, click “Create function” to the lower right corner:

Lambda Function: Review

Test Your Lambda Function

Great! Your Lambda function is now created. Click “Test” to see if it works:

Lambda Function: Test

As input event, choose “Alexa Start Session,” to see, if your code works:

Lambda Function Test: Alexa Start Session

Click “Save and Test,” aaand 🎉 it works!

Lambda Function: Test Result

Add ARN to Alexa Skill Configuration

Copy the ARN at the upper right corner:

Lambda Function: Copy ARN

Then go to the Configuration step of your Alexa Skill in the Amazon Developer Console and enter it:

Alexa Skill AWS Lambda ARN Endpoint

Great! Now it’s time to test your Skill:

6) “Hello World!”

After you passed the Configuration step, your Skill should be enabled for testing automatically:

Alexa Skill Test

Wanna get your first “Hello World!”? You can do this by either using the Service Simulator by Alexa, test on your device, or on your phone.

a) Test Your Skill in the Service Simulator

In the Test section of your Skill configuration, scroll down to the Service Simulator:

Alexa Skill Service Simulator

Here you can enter the utterance to test. For example, type in “say hello”:

Alexa Skill Hello World Test

This will create a JSON request and test it with your Skill. And if you look to the right: TADA 🎉! There is your response with “Hello World!” as output speech.

b) Test Your Skill on an Alexa Enabled Device

Once the Skill is enabled to test, you can use a device like Amazon Echo or Echo Dot (which is associated with the same email address you used for the developer account) to test your Skill:

Test Alexa Skill on your device

c) Test Your Skill on Your Phone

Don’t have an Echo or Echo Dot handy, but still want to listen to Alexa’s voice while testing your Skill? You can use Reverb for that.

Test Alexa Skill on your phone with Reverb.ai

With Reverb, you can bring Alexa functionality to either your Mac or your mobile phones (iOS and Android). Go to their website by clicking the screenshot above and download it to get started.

 

Next Steps

Great job! You’ve gone through all the necessary steps to prototype your own Alexa Skill. The next challenge is to build a real Skill. For this, take a look at the Jovo Documentation to see what else you can do with our Framework:

Jovo Documentation for Alexa Skills and Google Actions

Stay up to date

Get news and free voice develpment resources in your inbox. No spam.

No spam. You can find previous editions here.

Comments

Any specific questions? Just drop them below. Alternatively, you can find other channels to reach us here. Thank you!