Use these SSML benefits for your voice messages

2 months ago

We have exciting news to share! Our voice messaging gateway has received an upgrade: You can now use SSML (Speech Synthesis Markup Language) for your messages. What does this mean for you? First and foremost, you can add even more precision and customisation options to your voice messages. With SSML, you can add emphasis, insert pauses and vary the speed of speech to make your messages more natural and appealing.

This improvement allows you to create a more personalised user experience and increase the quality of your voice output. Our gateway thus offers you a comprehensive solution for sending high-quality voice messages. Read on to find out more about what SSML can do for you!

Table of Contents

How do I use SSML?

Use SSML by using the available markup elements when you send a request to our Voice API to achieve the desired voice output.

Please note: The “xml” parameter is not set for SSML.

General example

Create a JSON file with the phone numbers and text called voice.json.

{
"from": "+4917199999999",
"to": "+491718888888",
"text": "Welcome to Seven communications! You have successfully created a voice message."
}

Expand the text with the SSML tags, here the selection of a female German voice:

{
"from": "+4917199999999",
"to": "+491718888888",
"text": "Welcome to Seven communications! You have successfully created a voice message."
}

Send the text via cURL to our API:

curl -X POST "**<https://gateway.seven.io/api/voice**>" -H 'Content-Type: application/json' -H 'X-Api-Key: abcdefg123' -d @voice.json

Customise language and voice

One of the main advantages of SSML is the ability to select different languages and different voices. This makes it easier to customise messages for different recipients so that they understand and accept them well.

The Seven Voices are made up of the locale (BCP-47) and three variants, in this case female, male and child. Examples of these tags can be found in the following application examples.

In principle, you can also decide to use specific Microsoft voices. However, we strongly recommend that you use our tags, because:

The tags according to the “en-us-male” scheme offer the possibility to continue sending without interruption when changing the synthesisation engine without rewriting your application. If we switch to other synthesis engines or have to switch to other providers in the event of a fault, we can automatically integrate the corresponding voice names and your applications will continue to work. If you use the Microsoft voice names directly, this is not dynamic. You remain tied to the Microsoft engine and may have to adapt your applications.

In short: Use our tags to select a nationality and the variant of the voice – our gateway will select the most up-to-date voice for you.

Please note: The child voices are only available in de-DE, en-GB, en-US, fr-FR and zh-CN.

All currently available languages can be found in the Microsoft Azure list.

Example: English, female

<voice name="en-us-female">

Example: German, female

<voice name="de-de-female">

Different voices in one message

Would you like to use different voices in one message? This is also possible with SSML. If you decide to use specific Microsoft voices in this case, please bear in mind that we cannot change them dynamically (see section “Customise language and voice”).

<voice name="en-us-child"> Hi, where did you buy these sunglasses? I think my mum would love them. </voice>
<voice name="en-us-female"> Hello, i just bought them in the shop over there. </voice>

Time-to-live (TTL)

Even if the time-to-live is not strictly speaking part of SSML, you should know this parameter in order to be able to optimise your voice messages. The time-to-live can be used to specify the period in which the call may be made. This avoids flooding due to subsequent delivery of accumulated calls in the event of a delay or fault. The TTL is an additional parameter specified in seconds. The default value is one hour, i.e. 3600 seconds.

{
"from": "+4917199999999",
"to": "+491718888888",
"text": "Welcome to Seven communications! You have successfully created a voice message.",
"ttl": 86400 }

Influencing pronunciation

The “pronunciation” of the specified text can also be customised with SSML. This includes not only pauses and the speed at which the text is read, but also details such as the individual pronunciation of certain words. The pronunciation of the numbers used can also be customised so that when a code is sent, individual digits are read out instead of a long number.

Inserting a break

In this example, we use two different specifications for pauses:

First, we specify a pause over 200 milliseconds, then a pause with predefined strength “weak”.

The following values can be specified for “strength”: “x-weak”, “weak”, “medium”, “strong” and “x-strong”. No pause is inserted with the value “none”.

<voice name="en-us-male">
Step 1, Take a deep breath. <break time="200ms"/>
Step 2, exhale.
Step 3, Breathe in deeply again. <break strength="strong"/>
Step 4, exhale.
</voice>

Reading out numeric codes

Do you send access or confirmation codes via a telephone call? With SSML, you can control the speed at which the digits are read out or add additional pauses.

Adjust the speed in this example:

<voice name="en-us-female">
The confirmation code is:
<prosody rate="x-slow">
<say-as interpret-as="number_digit">967534</say-as>
</prosody>
</voice>

As you can hear in the example above, the numbers are pronounced very slowly.
If you want to set pauses between the digits but maintain the actual speaking speed, you can set colons between the numbers, for example.

<voice name="en-us-female">
The confirmation code is:
9: 6: 7: 3: 5: 4
</voice>

Date format

The SSML tag “date” is currently not available via our gateway. Nevertheless, we can determine date formats.

Use the tag “say-as” with the extension “interpret-as”. You can find more information in the Microsoft documentation.

The following example is spoken as “Today is February first two thousand and twenty-four”.

<voice name="en-us-male">
Today is <say-as interpret-as="date">2024-02-01</say-as>
</voice>

You do not necessarily need a say-as day for simple dates.

Write out the month. Years can then be specified as a number.

<voice name="en-us-male">
Your next appointment is January 22.
My birthday is June 22, 1976.
</voice>

Individual pronunciation – use the phonetic alphabet

If the standard pronunciation does not sound the way you want, you can specify how a word should be pronounced. To do this, use the phonetic alphabet and the corresponding tag.

In the following example, the name is pronounced as “Mike Jau” with a soft J, as in “jungle”.

The components of the phonetic alphabet can be found in the Microsoft documentation.

<voice name="en-US-JennyNeural"> His name is Mike <phoneme alphabet="ups" ph="JH AU"> Zhou
</phoneme>
</voice>

Sending audio files

The setting options are not enough for you? Send your own audio file!

Please note the following:

The audio file must be a valid **.mp3*, **.wav*, **.opus*, **.ogg*, **.flac* or **.wma* file.
The total time for all text and audio files in a single response must not exceed 600 seconds.
The audio files must not contain any customer-specific or other sensitive information.
(Source: Microsoft documentation)

For examples of use, please also refer to the Microsoft documentation or the Google documentation.

Best practice tips

You don’t have to use the speak tag, the Seven Voice API inserts it automatically.
“Keep it simple!” – The fewer tags and therefore characters required, the better.
When transferring to the API, we recommend not using line breaks between the tags. The engine switches pauses there, which can lead to undesirable results.
Remember that the text is converted into speech. A greeting such as “Dear Mr./Mrs. Name” will then also be read as “Dear Mr. Mrs.”. A general salutation is more suitable here, for example: “Hello name”.

Resources for using SSML

SSML offers many extensive possibilities for customizing voice messages. Because the possibilities are so varied, we would like to provide you with some resources to help you customize your SSML message.

Some useful resources for using SSML are:

Do you have any questions, suggestions or criticism? We look forward to your comment or message.

Please also visit our feedback page and take the opportunity to help make our service better.

All the best

Header picture by jacoblund via iStock