<Stream/>
mediaTool | Notes |
---|---|
Node.js 18+ | ES module syntax and modern WS APIs |
Twilio account | Copy your Account SID and Auth Token; buy/verify numbers |
Async account | Copy your API key and pick a Voice ID |
ngrok (free) | Exposes your local WS server to Twilio’s cloud |
.env
file next to the script:
node async-twilio.js OUTBOUND_NUMBER=+1555…
.Step | Flow |
---|---|
1 | Script connects to Async over WebSocket, sends an init frame (model, voice, codec). |
2 | A lightweight HTTP + WS server starts locally (ws://localhost:<port> ). |
3 | ngrok publishes that port; you get a public wss:// URL. |
4 | Script tells Twilio to dial <OUTBOUND_NUMBER> and stream call audio to that URL. |
5 | On Twilio start , script streams text → Async. |
6 | Async replies with μ‑law PCM chunks; script forwards each chunk to Twilio as media frames. |
7 | After all chunks (or on timeout) script ends the call. |
Goal | Where to change |
---|---|
Different voice | CFG.ASYNC_VOICE_ID |
Different codec / rate | output_format in connectAsyncTTS() |
Stream arbitrary text | Replace CFG.TEST_SENTENCE , or feed user input into asyncWs.send() |
Keep the call open | Remove the chunksSeen guard and endCall() timer |
force: true
in the transcript frame to synthesize short text immediately.