From a ringing phone to a
resolved conversation
Follow a single call through the system — the real-time turn loop, the AI Brain and the workers that pick up after hang-up.
Six stages, mostly in parallel
A turn is everything from a caller finishing a sentence to the agent starting its reply. Here's what happens in those few hundred milliseconds.
LISTEN
Twilio Media Streams pushes 20ms μ-law frames over a WebSocket. VAD + streaming Deepgram STT produce interim transcripts as the caller speaks.
ENDPOINT
When VAD detects silence, a Deepgram utterance-end event — or a one-token "are they done?" classifier — confirms the caller has finished.
THINK
RAG retrieval, caller-profile lookup and intent classification run in parallel, then the smart-routed LLM streams its reply token by token.
SPEAK
Complete clauses are split off and streamed straight into TTS — audio flows back to Twilio before the model finishes the sentence.
ACT
If the model emits a tool call, the dispatcher invokes your API; sensitive actions pause for one-click human approval.
REMEMBER
The turn is broadcast to live viewers and folded into rolling memory. On hang-up, the caller profile is updated for next time.
RAG and intent overlap STT; TTS starts before the LLM finishes. That overlap is the whole trick.
One app, three API surfaces, durable workers
A FastAPI core handles real-time calls; Celery workers handle everything that can happen after hang-up.
External API
X-API-Key auth. Integrators place calls and read results over REST.
Dashboard API
JWT auth for your team — agents, knowledge, analytics, billing, audit.
Twilio webhooks
Signature-validated telephony events drive the live media stream.
The work that happens once the call ends
Heavy lifting moves to durable workers so the live path stays fast and the call's aftermath is reliable.
- A summary, sentiment and outcome are generated immediately.
-
call.completedfans out to your workflows. - Signed webhooks fire — with retries on 5xx/timeout.
- Recordings are pulled and attached to the call record.
- Usage events roll up for accurate, real-time billing.
POST https://your-app.com/hook X-AURA-Event: call.completed X-AURA-Signature: sha256=… { "event": "call.completed", "data": { "call_sid": "CA…", "outcome": "answered", "duration": 142, "summary": "Booked demo…" } }
Resilient by design
The real-time path degrades gracefully, never abruptly — a provider hiccup is invisible to your caller.
Smart routing
Each turn picks the best LLM/STT/TTS by capability, cost and health — automatically.
Circuit breakers
A provider that errors or slows down is bypassed mid-call to a healthy fallback.
Retried delivery
Webhooks and post-call jobs retry with backoff and survive worker restarts.
Full observability
Prometheus metrics, OpenTelemetry traces and per-stage latency on every call.
Want the deep technical walkthrough?
Our engineers will take your team through the architecture and a live call.
Book a technical demo
