ASR Integrations

Learn more about automatic speech recognition (ASR) services that can be integrated with Jovo.

Introduction

Automatic speech recognition (in short, ASR) is the process of turning raw speech input into transcribed text. It is part of the interpretation step of the RIDR lifecycle.

Jovo offers integrations with a variety of ASR services. You can find all the current integrations here.

ASR integrations are helpful for platforms that deal with raw speech input. The integration then writes the results into an asr object that is part of the $input property:

{
  type: 'SPEECH',
  audio: { /* ... */ },
  asr: {
    text: 'the transcribed text',
  },
}

The text then needs to be turned into structured meaning by using an NLU integration. Some services like Amazon Lex are also called spoken language understanding (SLU) services because they do both the ASR and NLU parts.

Integrations

Currently, the following integrations are available with Jovo v4:

Configuration

An ASR integration needs to be added as a platform plugin in the app configuration. Here is an example how it could look like in the app.ts file:

import { CorePlatform } from '@jovotech/platform-core';
import { LexSlu } from '@jovotech/slu-lex';
// ...

const app = new App({
  plugins: [
    new CorePlatform({
      plugins: [new LexSlu()],
    }),
    // ...
  ],
});

Along with integration specific options (which can be found in each integration's documentation), there are also features that are configured the same way across all ASR integrations.

The default configuration for each ASR integration is:

new LexSlu({
  // ...
  input: {
    supportedTypes: ['SPEECH'],
  }
}),

The input config property determines how the ASR integration should react to certain properties. supportedTypes include all input types for which the ASR integration should run.