Web Client

Build voice experiences and chatbots for the web. This frontend client brings your Jovo app to websites and web apps.

Introduction

Jovo Client and Jovo Core Platform

Jovo Clients help with two tasks:

The Jovo Web Client can be used on websites and web apps. This is the vanilla JavaScript version for custom websites or frameworks and libraries like React. You can also find versions for Vue2 and Vue3.

Installation

Install the client package:

$ npm install @jovotech/client-web

If you want to use the client in a plain HTML/JS project (find an example HTML file here), you can set it up like this:

<script>
  const client = new window.JovoWebClient.Client('http://localhost:3000/webhook', {
    // Configuration
  });

  // ...
</script>

If you are using a library like React, you can initialize it like this:

const client = new Client('http://localhost:3000/webhook', {
  // Configuration
});

The constructor accepts two parameters:

Configuration

This is the default configuration for the Jovo Web Client:

{
    version: '4.0-beta',
    locale: 'en',
    platform: 'web',
    device: {
        id: '<uuid>',
        capabilities: [
            'AUDIO', 'SCREEN'
        ],
    },
    input: {
        audioRecorder: { /* ... */ },
    speechRecognizer: { /* ... */ },
    },
    output: {
        speechSynthesizer: { /* ... */ },
        audioPlayer: { /* ... */ },
        reprompts: { /* ... */ },
    },
    store: {
        storageKey: 'JOVO_WEB_CLIENT_DATA',
    shouldPersistSession: true,
    sessionExpirationInSeconds: 1800,
    },
}

Record User Input

You can record user input using the following methods:

await client.startRecording();

client.stopRecording(); // Successfully finish the recording
client.abortRecording(); // Cancel the recording

You can also pass an input modality. The default is AUDIO:

import { RecordingModalityType } from '@jovotech/client-web';
// ...

await client.startRecording({ type: RecordingModalityType.Audio }); // or 'AUDIO'

Depending on the configuration and browser support, the recording either uses the AudioRecorder or WebSpeech API SpeechRecognizer. Make sure that the client audio recorder is already initialized.

You can check if the client is currently recording input by using the following helper:

client.isRecordingInput;

Initialize

Some browsers and devices (for example iOS) need a user touch event before they can play or recording audio.

For this, the initialize() method can be used, which should be called in a click handler, for example:

initializeButton.addEventListener('click', async () => {
  await client.initialize();
});

This can be done as part of a launch button or a push to talk button.

You can check if the client is already initialized by using the following helper:

client.isInitialized;

AudioRecorder

The Jovo Web Client implements an AudioRecorder that records speech in an audio file and sends it to your Jovo app as SPEECH input type.

The default configuration for the AudioRecorder (which you can access with client.audioRecorder) is:

audioRecorder: {
  enabled: true,
  sampleRate: 16000,
  startDetection: { //
    enabled: true,
    timeoutInMs: 3000,
    threshold: 0.2,
  },
  silenceDetection: {
    enabled: true,
    timeoutInMs: 1500,
    threshold: 0.2,
  },

  // https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamConstraints/audio
  audioConstraints: { // ?
    echoCancellation: true,
    noiseSuppression: true,
  },

  // https://developer.mozilla.org/en-US/docs/Web/API/AudioContext
  analyser: {
    bufferSize: 2048,
    maxDecibels: -10,
    minDecibels: -90,
    smoothingTimeConstant: 0.85,
  },
},

You can also use the following helpers to detect browser support and check if AudioRecorder is currently recording:

client.audioRecorder.isInitialized;
client.audioRecorder.isRecording;
client.audioRecorder.startDetectionEnabled;
client.audioRecorder.silenceDetectionEnabled;

The AudioRecorder also emits events based on the recording status. The table below shows all events of the type AudioRecorderEvent:

Enum key Enum value Description
Start 'start' Recording has started.
Processing 'processing' Recording is in progress.
StartDetected 'start-detected' Speech was detected in the recording. Related to the startDetection configuration.
SilenceDetected 'silence-detected' Silence was detected in the recording. Related to the silenceDetection configuration.
Timeout 'timeout' Silence exceeded the silenceDetection.timeoutInMs configuration.
Abort 'abort' Recording was cancelled.
Stop 'stop' Recording was stopped.

WebSpeech API SpeechRecognizer

The WebSpeech API offers a speech recognition service that makes it easier to turn speech audio into transcribed text right in the browser.

This way, you can record speech input and send it to your Jovo app as TRANSCRIBED_SPEECH input type.

The default configuration for the SpeechRecognizer (which you can access with client.speechRecognizer) is:

speechRecognizer: {
  enabled: true,
  startDetection: { //
    enabled: true,
    timeoutInMs: 3000,
    threshold: 0.2,
  },
  silenceDetection: {
    enabled: true,
    timeoutInMs: 1500,
    threshold: 0.2,
  },

  // See https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition
  lang: 'en',
  continuous: true,
  interimResults: true,
  maxAlternatives: 1,
  grammars: window.SpeechGrammarList ? new window.SpeechGrammarList() : null,
}
  • startDetection: The start detection determines when in the recording process the user starts speaking.
  • silenceDetection: The start detection determines when in the recording process the user stops speaking.
  • All other configurations are explained in the official documentation by Mozilla.

You can also use the following helpers to detect browser support and check if SpeechRecognizer is currently recording speech:

client.speechRecognizer.isAvailable;
client.speechRecognizer.isRecording;
client.speechRecognizer.startDetectionEnabled;
client.speechRecognizer.silenceDetectionEnabled;

The SpeechRecognizer also emits events based on the recording status. The table below shows all events of the type SpeechRecognizerEvent:

Enum key Enum value Description
Start 'start' Recording has started.
StartDetected 'start-detected' Speech was detected in the recording. Related to the startDetection configuration.
SpeechRecognized 'speech-recognized' Speech is currently transcribed.
SilenceDetected 'silence-detected' Silence was detected in the recording. Related to the silenceDetection configuration.
Timeout 'timeout' Silence exceeded the silenceDetection.timeoutInMs configuration.
Abort 'abort' Recording was cancelled.
Stop 'stop' Recording was stopped.
End 'end' Speech recognition has finished.

Push to Talk

You can implement a push to talk experience by adding event listeners to a button, for example:

async onMouseDown(event: MouseEvent | TouchEvent) {
  if (!client.isInitialized) {
    await client.initialize();
  }
  if (client.isRecordingInput) {
    return;
  }
  if (event instanceof MouseEvent) {
    window.addEventListener('mouseup', this.onMouseUp);
  } else {
    window.addEventListener('touchend', this.onMouseUp);
  }
  await client.startRecording();
}

private onMouseUp(event: MouseEvent | TouchEvent) {
  window.removeEventListener('mouseup', this.onMouseUp);
  client.stopRecording();
}

Send a Request to Jovo

After successful user input, the Jovo Web Client sends a request to the Jovo app, where the Web Platform handles the conversational logic and then returns a response.

The request is based on different Jovo Input types, depending on the recording type:

While the client already does the job for you for AudioRecorder and SpeechRecognizer input, you can also manually send a request based on Jovo Input to the Jovo app using the send() method:

import { InputType } from '@jovotech/client-web';
// ...

const response = await client.send({
  type: InputType.Text, // or 'TEXT'
  text: 'Hello World',
});

If you want to make modifications before sending a request, you can also use the createRequest() method:

import { InputType } from '@jovotech/client-web';
// ...

const request = client.createRequest({
  type: InputType.Text, // or 'TEXT'
  text: 'Hello World',
});

// ...

const response = await client.send(request);

Handle the Response from Jovo

After sending a request to the Jovo app, the client waits for the app to go through the RIDR Lifecycle and return a Web Platform response.

This response contains an output property, which includes output templates that are used by the client to show and play a response to the user. For example, an output template could look like this:

{
  message: 'Do you like pizza?',
  quickReplies: ['yes', 'no'],
}

The response can be text based (e.g. chat bubbles) as well as audio or speech output. For this, the client offers helpful features to make playing audio output easier.

AudioPlayer

The AudioPlayer is responsible for playing audio files. Similar to the AudioRecorder, it needs to be initialized.

The default configuration for the AudioPlayer (which you can access with client.audioPlayer) is:

audioPlayer: {
  enabled: true
},

The player has the following features:

client.audioPlayer.play(audioSource: string, contentType = 'audio/mpeg');
client.audioPlayer.resume();
client.audioPlayer.pause();
client.audioPlayer.stop();

The AudioPlayer also emits events based on the its status. The table below shows all events of the type AudioPlayerEvent:

Enum key Enum value
Play 'play'
Pause 'pause'
Resume 'resume'
Stop 'stop'
End 'end'
Error 'error'

You can also use the following helpers:

client.audioPlayer.isInitialized;
client.audioPlayer.isPlaying; // or client.isPlayingAudio
client.audioPlayer.canResume;
client.audioPlayer.canPause;
client.audioPlayer.canStop;
client.audioPlayer.volume;

WebSpeech API SpeechSynthesizer

The WebSpeech API offers a speech synthesis service that makes it easier to turn output messages and reprompts into spoken audio right in the browser.

The configuration for the SpeechSynthesizer (which you can access with client.speechSynthesizer) is:

speechSynthesizer: {
  enabled: true,
  language: 'en',
  voice: SpeechSynthesisVoice
},

The player has the following features:

client.speechSynthesizer.speak(utterance: SpeechSynthesisUtterance | string, forceVolume = true);
client.speechSynthesizer.resume();
client.speechSynthesizer.pause();
client.speechSynthesizer.stop();

The SpeechSynthesizer also emits events based on the its status. The table below shows all events of the type SpeechSynthesizerEvent:

Enum key Enum value
Play 'play'
Pause 'pause'
Resume 'resume'
Stop 'stop'
End 'end'
Error 'error'

You can also use the following helpers:

client.speechSynthesizer.isAvailable;
client.speechSynthesizer.isSpeaking; // or client.isPlayingAudio
client.speechSynthesizer.canResume;
client.speechSynthesizer.canPause;
client.speechSynthesizer.canStop;
client.speechSynthesizer.volume;

The Web Client also implements an SSMLProcessor that processes standard SSML tags like audio and break.

Reprompts

The Web Client is able to play reprompts if the user doesn't respond to a prompt. This feature is currently only available for Speech Interfaces.

Reprompts are played by the RepromptProcessor, which can be configured like this:

reprompts: {
  enabled: true,
  maxAttempts: 1,
},

The maxAttempts property defines how many reprompts should be played before closing the session.

Deployment

If you want to deploy your web experience to production, you need to do the following: