AsyncFlow API
  1. Custom Pronunciations
AsyncFlow API
  • Welcome to AsyncFlow API
  • Get Started
  • API Reference
    • API Status
      • API Status Check
    • Text-to-Speech
      • Text to Speech (WebSocket)
      • Text to Speech
      • Text to Speech with Word Timestamps
      • Text to Speech (Stream)
    • Voice Management
      • Clone Voice
      • List Voices
      • Get Voice
      • Update Voice
      • Delete Voice
      • Get Voice Preview
  • Advanced Guides
    • Custom Pronunciations
      • Embedding Custom Phonemes in Async TTS API
      • Pronouncing digits one‑by‑one
      • Insert Silent Pauses with <break>
  • Integrations
    • Integrate with Twilio
    • Pipecat Integration
  1. Custom Pronunciations

Insert Silent Pauses with <break>

1. Why insert breaks?#

Even great prosody sometimes needs explicit timing. The <break> tag adds a precise, silent pause in speech without changing words or phonemes. Use it to improve clarity (e.g., after numbers or acronyms), add dramatic effect, or separate clauses where punctuation alone isn’t enough.

2. Tag syntax#

Element/AttrRequiredDescription
<break/>✓Self-closing tag that inserts a pause.
time✓Duration of the pause, e.g., 300ms, 1s
Example
Your code is 9 4 2 1 <break time='400ms'/>Please repeat it back.
Notes
Self-closing form: <break time='…'/>
Place <break/> between tokens; do not put it inside <phonemes>…</phonemes>.

3. Duration format & guidance#

Units supported: milliseconds (ms) and seconds (s)
Examples: 150ms, 250ms, 0.5s, 1s
Natural range: 80ms–1200ms is usually best
(Very small values may be rounded; very large values may be clamped.)
Heuristics
80–150 ms → subtle micro-pause (comma-like)
200–400 ms → short phrase break
500–800 ms → sentence-level pause
1.0–1.5 s → dramatic/section break

4. Request format#

POST /text_to_speech/streaming HTTP/1.1
Host: api.async.ai
Content-Type: application/json
X-Api-Key: <YOUR_API_KEY>

{
  "model_id": "asyncflow_v2.0",
  "transcript": "Welcome to Async.<break time='400ms'/>Let's get started.",
  "voice": { "mode": "id", "id": "e0f39dc4-f691-4e78-bba5-5c636692cc04" },
  "output_format": {
    "container": "raw",
    "encoding": "pcm_s16le",
    "sample_rate": 44100
  }
}

Curl shortcut#


5. Interaction with other tags#

With <digits>
OTPs/phone numbers benefit from a pause after the block:
Your OTP is <digits>6 1 4 9 2</digits><break time='300ms'/>It expires soon.
With <phonemes>
Use breaks around a phoneme block, not inside it:
Welcome to <phonemes>ˈeɪ.sɪŋk</phonemes>.<break time='350ms'/>Enjoy the demo.

6. Best practices#

1.
Prefer punctuation first; add <break> when punctuation doesn’t yield the timing you want.
2.
Use sparingly to avoid choppy delivery.
3.
Tune by ear—start at 250–400ms for phrase breaks and adjust.
4.
Keep rhythm consistent across similar sections (e.g., list items).
5.
Accessibility—short pauses after numeric blocks/acronyms improve intelligibility.

7. Examples#

Sentence-level pause
This is Async.<break time='600ms'/>A better way to build voice.
List pacing
You will hear three tones:<break time='250ms'/>low,<break time='250ms'/>mid,<break time='400ms'/>and high.
Digits + pause
Confirm code:<digits>3 8 1 7</digits><break time='300ms'/>Resending now.
Brand name with phonemes + pause
Welcome to <phonemes>ˈeɪ.sɪŋk</phonemes>.<break time='350ms'/>Let’s begin.

8. FAQ#

Q: Is time the only attribute?
A: Yes. Use time with ms or s. (Attributes like “strength” are not used in Async markup.)
Q: Can I chain multiple <break> tags?
A: You can, but prefer a single <break> with the total duration.
Q: Any hard limits?
A: Extremely small values may be rounded; very large values may be clamped. For naturalness, keep to 80–1200ms unless you intentionally need longer silence.
Modified at 2025-10-15 14:09:37
Previous
Pronouncing digits one‑by‑one
Next
Integrate with Twilio
Built with