In this section, you will learn how to use Jovo to craft a response to your users.

Introduction to Output Types

What do users expect from a voice assistant? Usually, it's either direct or indirect output in form of speech, audio, or visual information. In this section, you will learn more about basic output types like tell, ask, but also how to use SSML or the Jovo speechBuilder to create more advanced output elements.

Basic Output

Jovo's basic output options offer simple methods for interacting with users through text-to-speech. If you're interested in more, take a look at Advanced Output.


The tell method is used to have Alexa or Google Home say something to your users. You can either use plain text or SSML (Speech Synthesis Markup Language).

Important: The session ends after a tell method, this means the mic is off and there is no more interaction between the user and your app until the user invokes it again. Learn more about sessions here.


Whenever you want to make the experience more interactive and get some user input, the ask method is the way to go.

This method keeps the mic open (learn more about sessions here), meaning the speech element is used initially to ask the user for some input. If there is no response, the reprompt is used to ask again.

You can also use SSML for your speech and reprompt elements.

Multiple Reprompts

Google Assistant offers the functionality to use multiple reprompts.

You can find more detail about this feature here: Platform Specific Features > Google Assistant > Multiple Reprompts.


It is recommended to use a 'RepeatIntent' (e.g. the 'AMAZON.RepeatIntent') that allows users to ask your app to repeat the previous output if they missed it.

This feature makes use of the Jovo User Context. To be able to use it, please make sure that you have a database integration set up and an have not set the userContext.prev.size element to 0 (default is 1) in your config.

Advanced Output

Voice platforms offer a lot more than just converting a sentence or paragraph to speech output. In the following sections, you will learn more about advanced output elements.


SSML is short for "Speech Synthesis Markup Language." You can use it to can add more things like pronunciations, breaks, or audio files. For some more info, see the SSML references by Amazon, and by Google. Here's another valuable resource for cross-platform SSML.

Here is an example how SSML-enriched output could look like:

But isn't that a little inconvenient? Let's take a look at the Jovo speechBuilder.


With the speechBuilder, you can assemble a speech element by adding different types of input:

You can find everything about the SpeechBuilder here: App Logic > Output > SpeechBuilder.


Jovo uses a package called i18next to support multilanguage voice apps.

Here's the detailed documentation for it: App Logic > Output > i18n.

Raw JSON Responses

If you prefer to return some specific responses in a raw JSON format, you can do this with the platform-specific functions alexaSkill().setResponseObject and googleAction().setResponseObject.

Learn more about platform-specific features and resonses here: Platform Specifics.

Visual Output

The Jovo framework, besides sound and voice output, can also be used for visual output.

Learn more about visual output here: App Logic > Output > Visual Output.

No Speech Output

Sometimes, you might want to end a session without speech output. You can use the endSession method for this case:

Comments and Questions

Any specific questions? Just drop them below. Alternatively, you can also fill out this feedback form. Thank you!

Join Our Newsletter

Be the first to get our free tutorials, courses, and other resources for voice app developers.