Voice API
  1. Text-to-Speech
Voice API
  • Welcome to Async Voice API
  • Get Started
  • API Reference
    • API Status
      • API Status Check
    • Text-to-Speech
      • Text to Speech (WebSocket)
      • Text to Speech
        POST
      • Text to Speech with Word Timestamps
        POST
      • Text to Speech (Stream)
        POST
    • Voice Management
      • Clone Voice
      • List Voices
      • Get Voice
      • Update Voice
      • Delete Voice
      • Get Voice Preview
  • Advanced Guides
    • Embed Player
    • Custom Pronunciations
      • Embedding Custom Phonemes in Async Voice API
      • Pronouncing digits one‑by‑one
      • Insert Silent Pauses with <break>
  • Integrations
    • Integrate with Twilio
    • Pipecat Integration
  1. Text-to-Speech

Text to Speech (WebSocket)

wss://api.async.ai/text_to_speech/websocket/ws

Text-to-Speech WebSockets API#

The Text-to-Speech WebSockets API streams audio from partial or incrementally arriving text while preserving natural prosody and low latency.
Use it when your text arrives in real time (e.g., transcription, chat, or conversation scenarios).
It may be less suitable when the full text is available upfront (the HTTP API is simpler and slightly lower-latency).

Handshake#

WSS wss://api.async.ai/text_to_speech/websocket/ws

Path Parameters#

NameTypeRequiredDescription
api_keystringYesYour Async API key.
versionstringYesAPI version (e.g., v1).

Send#

1. initializeConnection (required)#

PropertyTypeRequiredDescription
model_idstringYesModel ID (e.g. "asyncflow_multilingual_v1.0.0").
voiceobjectYesDictionary with keys mode and id.
Example: { "mode": "id", "id": "e0f39dc4-f691-4e78-bba5-5c636692cc04" }
output_formatobjectNoAudio output settings
(default): { "container": "raw", "encoding": "pcm_s16le", "sample_rate": 44100 }.
πŸ’‘ See Text-to-Speech (Stream) for detailed parameter descriptions.

2. initializeContext (optional, for Multi-Context mode)#

This message creates or re-initializes a specific audio generation context within the same WebSocket connection.
If context_id is not provided, the default context is used.
PropertyTypeRequiredDescription
context_idstringNoUnique uuid identifier for the context. If omitted, default context is used.
transcriptstringYesThe initial text input for this context. Always ends with a single space.

3. sendText (required)#

PropertyTypeRequiredDescription
context_idstringNoTarget context for this text chunk. If omitted, default context is used.
transcriptstringYesThe new text chunk β€” always ends with a single space.
forcebooleanNoForce immediate synthesis even if the buffer is small (default: false).

4. closeContext (optional)#

Use this message to close a single context and complete its audio generation while keeping the connection alive.
PropertyTypeRequiredDescription
context_idstringYesContext to close.
close_contextbooleanYesAlways true.
transcriptstringYesMust be an empty string ("").

5. closeConnection (optional)#

Closes all active contexts and terminates the entire WebSocket connection gracefully.
PropertyTypeRequiredDescription
textstringYesEmpty string to close connection.
Alternatively, you can send a terminate message to close the entire WebSocket connection gracefully.
PropertyTypeRequiredDescription
terminatebooleanYesAlways true

Receive#

audioOutput (streamed)#

FieldTypeRequiredDescription
context_idstringYesID of the context this audio chunk belongs to.
audiostringYesBase-64 encoded audio data.
finalbooleanYestrue if this is the final chunk for this context; otherwise false.

finalOutput#

FieldTypeRequiredDescription
context_idstringYesID of the completed context.
audiostringYesAlways an empty string "".
finalbooleanYesAlways true. Marks completion of synthesis for this context.

error (optional)#

FieldTypeRequiredDescription
error_codestringYesError type identifier.
messagestringYesHuman-readable explanation.
extraobjectNoAdditional error details.

Example Message Flow#

Handshake β€” GET /text_to_speech/websocket/ws
↑ (send) initializeConnection β€” {"model_id": "asyncflow_multilingual_v1.0",...
{
  "model_id": "asyncflow_multilingual_v1.0",
  "voice": {
    "mode": "id",
    "id": "e0f39dc4-f691-4e78-bba5-5c636692cc04"
  },
  "output_format": {
    "container": "raw",
    "encoding": "pcm_f32le",
    "sample_rate": 44100
  }
}
↑ (send) Initialize multiple contexts
β€” {"context_id": "e1bb2844-fed4-418b-832c-8126db5a21e9",...
{
  "context_id": "e1bb2844-fed4-418b-832c-8126db5a21e9",
  "transcript": "Welcome to Async. "
}
{
  "context_id": "9f6bb19c-4616-4fbc-9d0f-f14a460eac01",
  "transcript": "This is a parallel narration. "
}
↑ (send) sendText β€” {"context_id":"e1bb2844-fed4-418b-832c-8126db5a21e9",...}
{ "context_id": "e1bb2844-fed4-418b-832c-8126db5a21e9", "transcript": "Let's continue with our main topic. " }
↓ (receive) audioOutput β€” {"context_id":"e1bb2844-fed4-418b-832c-8126db5a21e9",...}
{
  "context_id": "e1bb2844-fed4-418b-832c-8126db5a21e9",
  "audio": "Y3VyaW91cyBtaW5kcyB0aGluayBhbGlrZSA6KQ==",
  "final": false
}
↓ (receive) finalOutput β€” {"context_id":"e1bb2844-fed4-418b-832c-8126db5a21e9",...}
{
  "context_id": "e1bb2844-fed4-418b-832c-8126db5a21e9",
  "audio": "",
  "final": true
}
↑ (send) closeContext β€” {"context_id":"9f6bb19c-4616-4fbc-9d0f-f14a460eac01",...}
{
  "context_id": "9f6bb19c-4616-4fbc-9d0f-f14a460eac01",
  "close_context": true,
  "transcript": ""
}
↑ (send) closeConnection β€” { "terminate": true }
{ "terminate": true }

Multi-Context Overview#

The Multi-Context WebSocket API is designed to manage multiple independent audio generation streams (contexts) over a single WebSocket connection.
This is especially useful for scenarios that require concurrent or interleaved audio generations, such as dynamic conversational AI applications.
Each context β€” identified by its own context_id β€” maintains an independent state.
You can send text to specific contexts, flush them, or close them independently.
A terminate message can be used to gracefully close the entire WebSocket connection.

Key advantages:#

Multiple parallel audio streams within a single socket
Lower connection overhead and faster interleaving
Independent buffering and finalization per context

Request

Query Params

Modified atΒ 2025-11-07 10:35:47
Previous
API Status Check
Next
Text to Speech
Built with