AsyncFlow API
    AsyncFlow API
    • Welcome to AsyncFlow API
    • Get Started
    • API Reference
      • API Status
        • API Status Check
      • Text-to-Speech
        • Text to Speech (WebSocket)
        • Text to Speech
        • Text to Speech with Word Timestamps
        • Text to Speech (Stream)
      • Voice Management
        • Clone Voice
        • List Voices
        • Get Voice
        • Update Voice
        • Delete Voice
        • Get Voice Preview
    • Advanced Guides
      • Custom Pronunciations
        • Embedding Custom Phonemes in Async TTS API
        • Pronouncing digits one‑by‑one
    • Integrations
      • Integrate with Twilio
      • Pipecat Integration

    Welcome to AsyncFlow API

    AsyncFlow TTS API#

    Build natural-sounding voice applications with real-time performance.
    AsyncFlow enables real‑time, low‑latency text‑to‑speech for building voice assistants, chatbots and other audio experiences. The API can stream ultra‑realistic speech with about 300 ms latency, making it suitable for applications where responsiveness matters. It also supports voice cloning (from just a 5‑second sample) to create custom voices and offers multi‑language support (English, German, Spanish, French, Italian)

    🗣️ Text-to-Speech#

    AsyncFlow provides four main ways to convert text into speech, depending on your use case:
    HTTP (POST /text_to_speech) – Ideal when you have the full transcript up front; it generates a single audio file from the text and chosen voice.
    Streaming (POST /text_to_speech/streaming) – Returns the audio as a continuous stream. Use this when you want to start playback before the entire audio is generated.
    Word timestamps (POST /text_to_speech/with_timestamps) – Produces speech and per‑word timing information, returning the audio alongside arrays of words and their start/end times. This is useful for aligning animations or captions.
    WebSocket (WSS /text_to_speech/websocket/ws) - Designed for incremental text (e.g. live transcription). It streams audio chunks while maintaining natural prosody. You establish the connection with initializeConnection, send text chunks with sendText, and close it with closeConnection; the API replies with audioOutput and a final empty frame when complete.

    ⚡ Fast & High-Quality#

    This API balances speed and voice naturalness, making it a great choice whether you're building a responsive interface or high-fidelity voice content.

    🛠️ What's Coming Soon#

    🧼 Speech Enhancement (denoising, dereverberation)
    🔄 Speech-to-Speech (voice conversion)
    Check out the API Reference to explore endpoints and capabilities.
    Modified at 2025-08-27 07:17:05
    Next
    Get Started
    Built with