Text to Speech (WebSocket)

The Text-to-Speech WebSockets API streams audio from partial text while preserving consistent prosody. Use it when your text arrives incrementally (real-time transcription, chat, etc.).

It may be less suitable when the full text is available upfront (HTTP is simpler / lower-latency) or when you need to prototype quickly (WebSockets are more involved).

Handshake

WSS wss://api.async.ai/text_to_speech/websocket/ws

Path parameters

Name	Type	Required	Description
`api_key`	string	Yes	Async API key.
`version`	string	Yes	API version

Send

initializeConnection object — Required

Property	Type	Required	Description
`model_id`	string	Yes	Model ID (example: "asyncflow_v2.0")
`voice`	object	Yes	Dictionary with keys "mode" and "id". (example: {"mode": "id", "id": "e0f39dc4-f691-4e78-bba5-5c636692cc04"}
`output_format`	object	No	Dictionary with keys "container" , "encoding", "sample_rate", "bit_rate". Defualts to {container="raw", encoding="pcm_s16le", sample_rate=44100}
`language`	string	No	Generated speech langauge

For additional details, see the Text-to-Speech endpoint, which uses almost the same parameters.

sendText object — Required

Property	Type	Required	Description
`transcript`	string	Yes	New text chunk—always ends with a single space.
`force`	boolean	No	Force the TTS even if there is not enough characters in the buffer. Defaults to False.

closeConnection object — Required

Property	Type	Required	Description
`text`	string	Yes	Empty string to finish.

Receive

audioOutput object — streamed

Field	Type	Required	Description
`audio`	string	Yes	Base-64 audio chunk.
`final`	boolean	Yes	Whether this is the final response for the request

finalOutput object

Field	Type	Required	Notes
`audio`	string	Yes	Always "".
`final`	boolean	Yes	Always `true`; generation complete.

Error Responses object

Field	Type	Required	Notes
`error_code`	string	Yes	Error code identifying the type of error
`message`	string	Yes	Human-readable error message
`extra`	Object	No	Additional error details

Example handshake & message flow

Handshake — GET /text_to_speech/websocket/ws

↑ (send) initializeConnection — {"model_id": "asyncflow_v2.0",...

{
  "model_id": "asyncflow_v2.0",
  "voice": {
    "mode": "id",
    "id": "e0f39dc4-f691-4e78-bba5-5c636692cc04"
  },
  "output_format": {
    "container": "raw",
    "encoding": "pcm_f32le",
    "sample_rate": 44100
  }
}

↑ (send) sendText — {"text":"Welcome to Async."}

{"text":"Welcome to Async."}

↑ (send) closeConnection — {"text":""}

{ "text": "" }

↓ (receive) audioOutput — {"audio":"Y3Vya...",...}

{
  "audio": "Y3VyaW91cyBtaW5kcyB0aGluayBhbGlrZSA6KQ==",
  "final": false,
}

↓ (receive) finalOutput — {"audio":"", "final":true}

{ "audio": "", "final": true }

Text to Speech (WebSocket)

Handshake#

Path parameters#

Send#

Receive#

Example handshake & message flow#

Request

Handshake

Path parameters

Send

Receive

Example handshake & message flow