AsyncFlow API
  1. Text-to-Speech
AsyncFlow API
  • Welcome to AsyncFlow API
  • Get Started
  • API Reference
    • API Status
      • API Status Check
    • Text-to-Speech
      • Text to Speech (WebSocket)
      • Text to Speech
        POST
      • Text to Speech with Word Timestamps
        POST
      • Text to Speech (Stream)
        POST
    • Voice Management
      • Clone Voice
      • List Voices
      • Get Voice
      • Update Voice
      • Delete Voice
      • Get Voice Preview
  • Integrations
    • Integrate with Twilio
  1. Text-to-Speech

Text to Speech with Word Timestamps

POST
https://api.async.ai/text_to_speech/with_timestamps
Generates speech using provided text and voice of your choice and returns audio and word timestamps.
Request Request Example
Shell
JavaScript
Java
Swift
curl --location --request POST 'https://api.async.ai/text_to_speech/with_timestamps' \
--header 'x-api-key: <api-key>' \
--header 'version: v1' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model_id": "asyncflow_v2.0",
    "transcript": "Welcome to Async",
    "voice": {
        "mode": "id",
        "id": "e0f39dc4-f691-4e78-bba5-5c636692cc04"
    },
    "output_format": {
        "container": "raw",
        "encoding": "pcm_f32le",
        "sample_rate": 44100
    }
}'
Response Response Example
200 - Success
{
    "audio_base64": "...",
    "alignment": {
        "words": [
            "Welcome",
            "to",
            "Async"
        ],
        "word_start_times_milliseconds": [
            0,
            871,
            923
        ],
        "word_end_times_milliseconds": [
            871,
            900,
            1637
        ]
    }
}

Request

Header Params
x-api-key
string 
required
Example:
<api-key>
version
string 
required
Example:
v1
Body Params application/json
model_id
enum<string> 
required
Model id
Allowed value:
asyncflow_v2.0
transcript
string 
required
Text to convert to the speech
voice
object 
required
Voice to use while make speech
mode
enum<string> 
required
Allowed value:
id
id
string 
required
Voice id
output_format
object 
required
Output configurations
container
enum<string> 
required
Output audio format
Allowed values:
rawmp3wav
encoding
enum<string> 
optional
Output audio encoding. Ignore if mp3
Allowed values:
pcm_f32lepcm_s16le
sample_rate
integer 
required
Output audio sample rate
>= 8000<= 48000
bit_rate
integer 
optional
Output audio bit rate. Use only with mp3
>= 32000<= 320000
Default:
192000
string 
required
Examples

Responses

🟢200Success
application/json
Body
audio_base64
string 
required
Audio file content base64 encoded
alignment
object 
required
words
array[string]
required
word_start_times_milliseconds
array[number]
required
word_end_times_milliseconds
array[number]
required
🟠429TOO_MANY_CONCURRENT_REQUESTS
🟠429RATE_LIMIT_EXCEEDED
🟠429USAGE_LIMIT_EXCEEDED
🟠401INVALID_API_KEY
🟠404VOICE_NOT_FOUND
🟠404VERSION_NOT_FOUND
🟠400INVALID_LANGUAGE
🟠400FORMAT_NOT_RECOGNIZED
🔴500Server Error
Modified at 2025-07-15 14:04:15
Previous
Text to Speech
Next
Text to Speech (Stream)
Built with