<Stream/> media| Tool | Notes |
|---|---|
| Node.js 18+ | ES module syntax and modern WS APIs |
| Twilio account | Copy your Account SID and Auth Token; buy/verify numbers |
| Async account | Copy your API key and pick a Voice ID |
| ngrok (free) | Exposes your local WS server to Twilio’s cloud |
.env file next to the script:node async-twilio.js OUTBOUND_NUMBER=+1555….| Step | Flow |
|---|---|
| 1 | Script connects to Async over WebSocket, sends an init frame (model, voice, codec). |
| 2 | A lightweight HTTP + WS server starts locally (ws://localhost:<port>). |
| 3 | ngrok publishes that port; you get a public wss:// URL. |
| 4 | Script tells Twilio to dial <OUTBOUND_NUMBER> and stream call audio to that URL. |
| 5 | On Twilio start, script streams text → Async. |
| 6 | Async replies with μ‑law PCM chunks; script forwards each chunk to Twilio as media frames. |
| 7 | After all chunks (or on timeout) script ends the call. |
| Goal | Where to change |
|---|---|
| Different voice | CFG.ASYNC_VOICE_ID |
| Different codec / rate | output_format in connectAsyncTTS() |
| Stream arbitrary text | Replace CFG.TEST_SENTENCE, or feed user input into asyncWs.send() |
| Keep the call open | Remove the chunksSeen guard and endCall() timer |
force: true in the transcript frame to synthesize short text immediately.