Amazon Polly TTS Integration

Turn text into audio files with the Jovo Framework text to speech (TTS) integration for Amazon Polly.

Introduction

Polly is a text to speech (TTS) service that turns text into lifelike speech with dozens of voices across a broad set of languages.

Learn more in the following sections:

Installation

You can install the plugin like this:

$ npm install @jovotech/tts-polly

TTS plugins can be added to Jovo platform integrations. Here is an example how it can be added to the Jovo Core Platform in your app.ts app configuration:

import { CorePlatform } from '@jovotech/platform-core';
import { PollyTts } from '@jovotech/tts-polly';
// ...

app.configure({
  plugins: [
    new CorePlatform({
      plugins: [new PollyTts()],
    }),
    // ...
  ],
});

If you are running your Jovo app on AWS Lambda, there is no need to add configurations if you want to stick to the default options. For apps outside AWS Lambda, you need to add a region and credentials to the libraryConfig like this:

new PollyTts({
  libraryConfig: {
    region: 'us-east-1',
    credentials: {
      accessKeyId: '<YOUR-ACCESS-KEY-ID>',
      secretAccessKey: '<YOUR-SECRET-ACCESS-KEY>'
    },
    // ...
  },
  // ...
}),

Learn more about all configurations in the configuration section.

Configuration

The following configurations can be added:

new PollyTts({
  outputFormat: 'mp3',
  fallbackLocale: 'en-US',
  voiceId: 'Matthew',
  sampleRate: '16000',
  engine: 'standard',
  lexiconNames: [],
  languageCode: 'en-IN',
  speechMarkTypes: [],
  cache: new S3TtsCache({/* ... */}),
  libraryConfig: {
    region: 'us-east-1',
    // ...
  }
}),
  • outputFormat: The format in which the returned output will be encoded. See outputFormat Polly docs for more information. Default: mp3.
  • fallbackLocale: Used as a fallback if the locale from Jovo is not found. Default: en-US.
  • voiceId: Voice ID to use for the synthesis. See voiceId Polly docs for more information. Default: Matthew.
  • sampleRate: The audio frequency specified in Hz. See sampleRate Polly docs for more information. Default: 16000.
  • engine: Specifies the engine (standard or neural) for Amazon Polly to use when processing input text for speech synthesis. See engine Polly docs for more information. Default: standard.
  • lexiconNames: List of one or more pronunciation lexicon names you want the service to apply during synthesis. See lexiconNames Polly docs for more information. Optional.
  • languageCode: Language code for the Synthesize Speech request. This is only necessary if using a bilingual voice, such as Aditi, which can be used for either Indian English (en-IN) or Hindi (hi-IN). See languageCode Polly docs for more information. Optional.
  • speechMarkTypes: The type of speech marks returned for the input text. See speechMarkTypes Polly docs for more information. Optional.
  • cache: TTS Cache integration, for example S3 Cache. Optional.
  • libraryConfig: PollyClientConfig object that is passed to the Polly client. Use this for configurations like region or credentials. Optional.

TTS Cache

Without a TTS cache, each time text is passed to Polly, you will incur the cost and time of generating the TTS response. Use a TTS cache to reduce costs and save time.

If you're hosting your Jovo app in the AWS environment, for example using AWS Lambda, we recommend using S3 Cache to store generated audio files in an S3 bucket:

import { PollyTts } from '@jovotech/tts-polly';
import { S3TtsCache } from '@jovotech/ttscache-s3';
// ...

new PollyTts({
  cache: new S3TtsCache({
    bucket: '<YOUR-BUCKET-NAME>', // Example: 'mybucket-public'
    path: '<YOUR-PATH>', // Example: 'tts'
  }),
  // ...
}),

See TTS for more information and a list of TTS cache implementations.

libraryConfig

The libraryConfig property can be used to pass configurations to the AWS Polly SDK that is used by this integration.

new PollyTts({
  libraryConfig: { /* ... */ },
  // ...
}),

You can learn more about all config options in the official PollyClientConfig reference.

For example, you can add a region and credentials like shown below. This is necessary if you are hosting your Jovo app outside of an AWS environment.

new PollyTts({
  libraryConfig: {
    region: 'us-east-1',
    credentials: {
      accessKeyId: '<YOUR-ACCESS-KEY-ID>',
      secretAccessKey: '<YOUR-SECRET-ACCESS-KEY>'
    },
    // ...
  },
  // ...
}),