Overview

API Reference¶

Comprehensive REST API and WebSocket documentation for the Real-Time Voice Agent backend built on Python 3.11 + FastAPI.

Quick Start¶

The API provides comprehensive Azure integrations for voice-enabled applications:

Azure Communication Services - Call automation and bidirectional media streaming
Azure Speech Services - Neural text-to-speech and speech recognition
Azure OpenAI - Conversational AI and language processing

API Endpoints Overview¶

The V1 API provides REST and WebSocket endpoints organized by domain:

Health & Monitoring¶

Endpoint	Method	Description
`/api/v1/health`	GET	Basic liveness check for load balancers
`/api/v1/readiness`	GET	Comprehensive dependency health validation
`/api/v1/agents`	GET	List loaded agents with configuration
`/api/v1/agents/{name}`	GET	Get specific agent details
`/api/v1/agents/{name}`	PUT	Update agent runtime configuration

Call Management¶

Endpoint	Method	Description
`/api/v1/calls/initiate`	POST	Initiate outbound call via ACS
`/api/v1/calls/`	GET	List calls with pagination and filtering
`/api/v1/calls/terminate`	POST	Terminate active call by connection ID
`/api/v1/calls/answer`	POST	Handle inbound call/Event Grid validation
`/api/v1/calls/callbacks`	POST	Process ACS webhook callback events

Media Streaming¶

Endpoint	Type	Description
`/api/v1/media/status`	GET	Get media streaming configuration status
`/api/v1/media/stream`	WebSocket	ACS bidirectional audio streaming

Browser Conversations¶

Endpoint	Type	Description
`/api/v1/browser/status`	GET	Browser service status and connection counts
`/api/v1/browser/dashboard/relay`	WebSocket	Dashboard client real-time updates
`/api/v1/browser/conversation`	WebSocket	Browser-based voice conversations

Session Metrics¶

Endpoint	Method	Description
`/api/v1/metrics/sessions`	GET	List active sessions with basic metrics
`/api/v1/metrics/session/{id}`	GET	Detailed latency/telemetry for a session
`/api/v1/metrics/summary`	GET	Aggregated metrics across recent sessions

Agent Builder¶

Endpoint	Method	Description
`/api/v1/agent-builder/tools`	GET	List available tools for agents
`/api/v1/agent-builder/voices`	GET	List available TTS voices
`/api/v1/agent-builder/defaults`	GET	Get default agent configuration
`/api/v1/agent-builder/templates`	GET	List available agent templates
`/api/v1/agent-builder/templates/{id}`	GET	Get specific template details
`/api/v1/agent-builder/create`	POST	Create dynamic agent for session
`/api/v1/agent-builder/session/{id}`	GET	Get session agent configuration
`/api/v1/agent-builder/session/{id}`	PUT	Update session agent configuration
`/api/v1/agent-builder/session/{id}`	DELETE	Reset to default agent
`/api/v1/agent-builder/sessions`	GET	List all sessions with dynamic agents
`/api/v1/agent-builder/reload-agents`	POST	Reload agent templates from disk

Demo Environment¶

Endpoint	Method	Description
`/api/v1/demo-env/temporary-user`	POST	Create synthetic demo user profile
`/api/v1/demo-env/temporary-user`	GET	Lookup demo profile by email

TTS Health¶

Endpoint	Method	Description
`/api/v1/tts/dedicated/health`	GET	TTS pool health status
`/api/v1/tts/dedicated/metrics`	GET	TTS pool performance metrics
`/api/v1/tts/dedicated/status`	GET	Ultra-fast status for load balancers

Interactive API Documentation¶

👉 Complete API Reference - Interactive OpenAPI documentation with all REST endpoints, WebSocket details, authentication, and configuration.

WebSocket Endpoints¶

ACS Media Streaming (`/api/v1/media/stream`)¶

Real-time bidirectional audio streaming for Azure Communication Services calls.

Query Parameters: - call_connection_id (required): ACS call connection identifier - session_id (optional): Browser session ID for UI coordination

Streaming Modes: - MEDIA: Traditional STT/TTS pipeline (PCM 16kHz mono) - VOICE_LIVE: Azure OpenAI Realtime API (PCM 24kHz mono) - TRANSCRIPTION: Real-time transcription only

Browser Conversation (`/api/v1/browser/conversation`)¶

Browser-based voice conversations with session persistence.

Query Parameters: - session_id (optional): Session identifier for restoration - streaming_mode (optional): VOICE_LIVE or REALTIME - user_email (optional): User email for context

Features: - Real-time speech-to-text transcription - TTS audio streaming for responses - Barge-in detection and handling - Session context persistence

Dashboard Relay (`/api/v1/browser/dashboard/relay`)¶

Real-time updates for dashboard clients monitoring conversations.

Query Parameters: - session_id (optional): Filter updates for specific session

Observability¶

OpenTelemetry Tracing - Built-in distributed tracing for production monitoring with Azure Monitor integration:

Session-level spans for complete request lifecycle
Service dependency mapping (Speech, Communication Services, Redis, OpenAI)
Audio processing latency and error rate monitoring
Automatic context propagation via session_context wrapper

Streaming Modes¶

The API supports multiple streaming modes configured via ACS_STREAMING_MODE:

Mode	Description	Audio Format	Use Case
`MEDIA`	Traditional STT/TTS with Speech Cascade	PCM 16kHz mono	Phone calls with orchestrator
`VOICE_LIVE`	Azure OpenAI Realtime API	PCM 24kHz mono	Low-latency conversational AI
`TRANSCRIPTION`	Real-time transcription only	PCM 16kHz mono	Call recording and analysis
`REALTIME`	Browser-based Speech Cascade	PCM 16kHz mono	Browser voice conversations

📖 Streaming Mode Details - Complete streaming mode documentation

Architecture¶

Three-Thread Design - Optimized for real-time conversational AI with sub-10ms barge-in detection:

Speech SDK Thread - Audio processing and recognition
Route Turn Thread - LLM orchestration and tool execution
Main Event Loop - WebSocket I/O and TTS streaming

📖 Architecture Details - Complete speech architecture documentation

Reliability¶

Graceful Degradation - Following Azure Communication Services reliability patterns:

Connection pooling and retry logic with exponential backoff
Headless environment support with memory-only audio synthesis
Managed identity authentication with automatic token refresh
Session-aware resource management via OnDemandResourcePool

API Reference - Complete OpenAPI specification with interactive testing
Speech Architecture - STT, TTS, and cascade orchestration
Agent Architecture - Multi-agent system and handoffs
Data Architecture - State management and persistence
Architecture Overview - System architecture and deployment patterns