Skip to content

API Reference

API Reference

Interactive API documentation generated from the OpenAPI schema. This provides the definitive reference for all REST endpoints, WebSocket connections, authentication, and configuration.

Interactive Documentation

Real-Time Voice Agent API 1.0.0

Real-Time Voice Agent API


Contact: Real-Time Voice Agent Team support@example.com

License: MIT License

health

GET /api/v1/health

Basic Health Check

Description Basic health check endpoint that returns 200 if the server is running. Used by load balancers for liveness checks.

Response 200 OK

application/json

{
    "status": "healthy",
    "version": "1.0.0",
    "timestamp": 1691668800.0,
    "message": "Real-Time Audio Agent API v1 is running",
    "details": {
        "api_version": "v1",
        "service": "rtagent-backend"
    }
}

Schema of the response body

{
    "properties": {
        "status": {
            "type": "string",
            "title": "Status",
            "description": "Overall health status",
            "example": "healthy"
        },
        "version": {
            "type": "string",
            "title": "Version",
            "description": "API version",
            "default": "1.0.0",
            "example": "1.0.0"
        },
        "timestamp": {
            "type": "number",
            "title": "Timestamp",
            "description": "Timestamp when check was performed",
            "example": 1691668800.0
        },
        "message": {
            "type": "string",
            "title": "Message",
            "description": "Human-readable status message",
            "example": "Real-Time Audio Agent API v1 is running"
        },
        "details": {
            "additionalProperties": true,
            "type": "object",
            "title": "Details",
            "description": "Additional health details",
            "example": {
                "api_version": "v1",
                "service": "rtagent-backend"
            }
        },
        "active_sessions": {
            "anyOf": [
                {
                    "type": "integer"
                },
                {
                    "type": "null"
                }
            ],
            "title": "Active Sessions",
            "description": "Current number of active realtime conversation sessions (None if unavailable)",
            "example": 3
        },
        "session_metrics": {
            "anyOf": [
                {
                    "additionalProperties": true,
                    "type": "object"
                },
                {
                    "type": "null"
                }
            ],
            "title": "Session Metrics",
            "description": "Optional granular session metrics (connected/disconnected, etc.)",
            "example": {
                "active": 3,
                "connected": 5,
                "disconnected": 2
            }
        }
    },
    "type": "object",
    "required": [
        "status",
        "timestamp",
        "message"
    ],
    "title": "HealthResponse",
    "description": "Health check response model.",
    "example": {
        "active_sessions": 3,
        "details": {
            "api_version": "v1",
            "service": "rtagent-backend"
        },
        "message": "Real-Time Audio Agent API v1 is running",
        "session_metrics": {
            "active": 3,
            "connected": 5,
            "disconnected": 2
        },
        "status": "healthy",
        "timestamp": 1691668800.0,
        "version": "1.0.0"
    }
}

GET /api/v1/readiness

Comprehensive Readiness Check

Description Comprehensive readiness probe that checks all critical dependencies with timeouts.

This endpoint verifies:
- Redis connectivity and performance
- Azure OpenAI client health
- Speech services (TTS/STT) availability
- ACS caller configuration and connectivity
- RT Agents initialization
- Authentication configuration (when ENABLE_AUTH_VALIDATION=True)
- Event system health

When authentication validation is enabled, checks:
- BACKEND_AUTH_CLIENT_ID is set and is a valid GUID
- AZURE_TENANT_ID is set and is a valid GUID
- ALLOWED_CLIENT_IDS contains at least one valid GUID

Returns 503 if any critical services are unhealthy, 200 if all systems are

ready.

Response 200 OK

application/json

{
    "status": "ready",
    "timestamp": 1691668800.0,
    "response_time_ms": 45.2,
    "checks": [
        {
            "component": "redis",
            "status": "healthy",
            "check_time_ms": 12.5,
            "details": "Connected to Redis successfully"
        },
        {
            "component": "auth_configuration",
            "status": "healthy",
            "check_time_ms": 1.2,
            "details": "Auth validation enabled with 2 allowed client(s)"
        }
    ],
    "event_system": {
        "is_healthy": true,
        "handlers_count": 7,
        "domains_count": 2
    }
}

Schema of the response body

{
    "properties": {
        "status": {
            "type": "string",
            "enum": [
                "ready",
                "not_ready",
                "degraded"
            ],
            "title": "Status",
            "description": "Overall readiness status",
            "example": "ready"
        },
        "timestamp": {
            "type": "number",
            "title": "Timestamp",
            "description": "Timestamp when check was performed",
            "example": 1691668800.0
        },
        "response_time_ms": {
            "type": "number",
            "title": "Response Time Ms",
            "description": "Total time taken for all checks in milliseconds",
            "example": 45.2
        },
        "checks": {
            "items": {
                "$ref": "#/components/schemas/ServiceCheck"
            },
            "type": "array",
            "title": "Checks",
            "description": "Individual component health checks"
        },
        "event_system": {
            "anyOf": [
                {
                    "additionalProperties": true,
                    "type": "object"
                },
                {
                    "type": "null"
                }
            ],
            "title": "Event System",
            "description": "Event system status information",
            "example": {
                "domains_count": 2,
                "handlers_count": 7,
                "is_healthy": true
            }
        }
    },
    "type": "object",
    "required": [
        "status",
        "timestamp",
        "response_time_ms",
        "checks"
    ],
    "title": "ReadinessResponse",
    "description": "Comprehensive readiness check response model.",
    "example": {
        "checks": [
            {
                "check_time_ms": 12.5,
                "component": "redis",
                "details": "Connected to Redis successfully",
                "status": "healthy"
            },
            {
                "check_time_ms": 8.3,
                "component": "azure_openai",
                "details": "Client initialized",
                "status": "healthy"
            }
        ],
        "event_system": {
            "domains_count": 2,
            "handlers_count": 7,
            "is_healthy": true
        },
        "response_time_ms": 45.2,
        "status": "ready",
        "timestamp": 1691668800.0
    }
}

Response 503 Service Unavailable

application/json

{
    "status": "not_ready",
    "timestamp": 1691668800.0,
    "response_time_ms": 1250.0,
    "checks": [
        {
            "component": "redis",
            "status": "unhealthy",
            "check_time_ms": 1000.0,
            "error": "Connection timeout"
        },
        {
            "component": "auth_configuration",
            "status": "unhealthy",
            "check_time_ms": 2.1,
            "error": "BACKEND_AUTH_CLIENT_ID is not a valid GUID"
        }
    ]
}

Schema of the response body


GET /api/v1/agents

Get Agents Info

Description Get information about loaded RT agents including their configuration, model settings, and voice settings that can be modified.

Response 200 OK

application/json

Schema of the response body


PUT /api/v1/agents/{agent_name}

Update Agent Config

Description Update configuration for a specific agent (model settings, voice, etc.). Changes are applied to the runtime instance but not persisted to YAML files.

Input parameters

Parameter In Type Default Nullable Description
agent_name path string No

Request body

application/json

{
    "model": null,
    "voice": null
}
This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "properties": {
        "model": {
            "anyOf": [
                {
                    "$ref": "#/components/schemas/AgentModelUpdate"
                },
                {
                    "type": "null"
                }
            ]
        },
        "voice": {
            "anyOf": [
                {
                    "$ref": "#/components/schemas/AgentVoiceUpdate"
                },
                {
                    "type": "null"
                }
            ]
        }
    },
    "type": "object",
    "title": "AgentConfigUpdate"
}

Response 200 OK

application/json

Schema of the response body


Response 422 Unprocessable Entity

application/json

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}
This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "type": "array",
            "title": "Detail"
        }
    },
    "type": "object",
    "title": "HTTPValidationError"
}

Call Management

POST /api/v1/calls/initiate

Initiate Outbound Call

Description Initiate a new outbound call to the specified phone number.

This endpoint:
- Validates the phone number format
- Generates a unique call ID
- Emits a call initiation event through the V1 event system
- Returns immediately with call status

The actual call establishment is handled asynchronously through Azure

Communication Services.

Request body

application/json

{
    "caller_id": "+1987654321",
    "context": {
        "customer_id": "cust_12345",
        "department": "support"
    },
    "target_number": "+1234567890"
}
This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "properties": {
        "target_number": {
            "type": "string",
            "pattern": "^\\+[1-9]\\d{1,14}$",
            "title": "Target Number",
            "description": "Phone number to call in E.164 format (e.g., +1234567890)",
            "example": "+1234567890"
        },
        "caller_id": {
            "anyOf": [
                {
                    "type": "string"
                },
                {
                    "type": "null"
                }
            ],
            "title": "Caller Id",
            "description": "Caller ID to display (optional, uses system default if not provided)",
            "example": "+1987654321"
        },
        "context": {
            "anyOf": [
                {
                    "additionalProperties": true,
                    "type": "object"
                },
                {
                    "type": "null"
                }
            ],
            "title": "Context",
            "description": "Additional call context metadata",
            "example": {
                "customer_id": "cust_12345",
                "department": "support",
                "priority": "high",
                "source": "web_portal"
            }
        }
    },
    "type": "object",
    "required": [
        "target_number"
    ],
    "title": "CallInitiateRequest",
    "description": "Request model for initiating a call.",
    "example": {
        "caller_id": "+1987654321",
        "context": {
            "customer_id": "cust_12345",
            "department": "support"
        },
        "target_number": "+1234567890"
    }
}

Response 200 OK

application/json

{
    "call_id": "call_abc12345",
    "status": "initiating",
    "target_number": "+1234567890",
    "message": "Call initiation requested for +1234567890"
}

Schema of the response body

{
    "properties": {
        "call_id": {
            "type": "string",
            "title": "Call Id",
            "description": "Unique call identifier",
            "example": "call_abc12345"
        },
        "status": {
            "type": "string",
            "title": "Status",
            "description": "Current call status",
            "example": "initiating"
        },
        "target_number": {
            "type": "string",
            "title": "Target Number",
            "description": "Target phone number",
            "example": "+1234567890"
        },
        "message": {
            "type": "string",
            "title": "Message",
            "description": "Human-readable status message",
            "example": "Call initiation requested"
        }
    },
    "type": "object",
    "required": [
        "call_id",
        "status",
        "target_number",
        "message"
    ],
    "title": "CallInitiateResponse",
    "description": "Response model for call initiation.",
    "example": {
        "call_id": "call_abc12345",
        "message": "Call initiation requested for +1234567890",
        "status": "initiating",
        "target_number": "+1234567890"
    }
}

Response 400 Bad Request

application/json

{
    "detail": "Invalid phone number format. Must be in E.164 format (e.g., +1234567890)"
}

Schema of the response body


Response 500 Internal Server Error

application/json

{
    "detail": "Failed to initiate call: Azure Communication Service unavailable"
}

Schema of the response body


Response 422 Unprocessable Entity

application/json

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}
This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "type": "array",
            "title": "Detail"
        }
    },
    "type": "object",
    "title": "HTTPValidationError"
}

GET /api/v1/calls/

List Calls

Description Retrieve a paginated list of calls with optional filtering.

Supports:
- Pagination with page and limit parameters
- Filtering by call status
- Sorting by creation time (newest first)

Input parameters

Parameter In Type Default Nullable Description
limit query integer 10 No Number of items per page (1-100)
page query integer 1 No Page number (1-based)
status_filter query None No Filter calls by status

Response 200 OK

application/json

{
    "calls": [
        {
            "call_id": "call_abc12345",
            "status": "connected",
            "duration": 120,
            "participants": [],
            "events": []
        }
    ],
    "total": 25,
    "page": 1,
    "limit": 10
}

Schema of the response body

{
    "properties": {
        "calls": {
            "items": {
                "$ref": "#/components/schemas/CallStatusResponse"
            },
            "type": "array",
            "title": "Calls",
            "description": "List of calls"
        },
        "total": {
            "type": "integer",
            "title": "Total",
            "description": "Total number of calls matching criteria",
            "example": 25
        },
        "page": {
            "type": "integer",
            "title": "Page",
            "description": "Current page number (1-based)",
            "default": 1,
            "example": 1
        },
        "limit": {
            "type": "integer",
            "title": "Limit",
            "description": "Number of items per page",
            "default": 10,
            "example": 10
        }
    },
    "type": "object",
    "required": [
        "calls",
        "total"
    ],
    "title": "CallListResponse",
    "description": "Response model for listing calls.",
    "example": {
        "calls": [
            {
                "call_id": "call_abc12345",
                "duration": 120,
                "events": [],
                "participants": [],
                "status": "connected"
            }
        ],
        "limit": 10,
        "page": 1,
        "total": 25
    }
}

Response 400 Bad Request

application/json

{
    "detail": "Page number must be positive"
}

Schema of the response body


Response 422 Unprocessable Entity

application/json

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}
This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "type": "array",
            "title": "Detail"
        }
    },
    "type": "object",
    "title": "HTTPValidationError"
}

POST /api/v1/calls/answer

Answer Inbound Call

Description Handle inbound call events and Event Grid subscription validation.

This endpoint:
- Validates Event Grid subscription requests
- Answers incoming calls automatically with orchestrator selection
- Initializes conversation state with features
- Supports pluggable conversation orchestrators
- Provides advanced tracing and monitoring

Enhanced V1 features:
- Pluggable orchestrator injection for conversation handling
- Enhanced state management with orchestrator metadata
- Advanced observability and correlation
- Production-ready error handling

Response 200 OK

application/json

{
    "status": "call answered",
    "orchestrator": "gpt_flow",
    "acs_features": {
        "orchestrator_support": true,
        "advanced_tracing": true,
        "api_version": "v1"
    }
}

Schema of the response body


Response 400 Bad Request

application/json

{
    "detail": "Invalid Event Grid request format"
}

Schema of the response body


Response 503 Service Unavailable

application/json

{
    "detail": "ACS not initialised"
}

Schema of the response body


POST /api/v1/calls/callbacks

Handle ACS Callback Events

Description Handle Azure Communication Services callback events.

This endpoint receives webhooks from ACS when call events occur:
- Call connected/disconnected
- Participant joined/left
- Media events (DTMF tones, play completed, etc.)
- Transfer events

The endpoint validates authentication, processes events through the
V1 CallEventProcessor system, and returns processing results.

Response 200 OK

application/json

{
    "status": "success",
    "processed_events": 1,
    "call_connection_id": "abc123"
}

Schema of the response body


Response 500 Internal Server Error

application/json

{
    "error": "Failed to process callback events"
}

Schema of the response body


Response 503 Service Unavailable

application/json

{
    "error": "ACS not initialised"
}

Schema of the response body


ACS Media Session

GET /api/v1/media/status

Get Media Streaming Status

Description Get the current status of media streaming configuration.

:return: Current media streaming configuration and status :rtype: dict

Response 200 OK

application/json

Schema of the response body

{
    "additionalProperties": true,
    "type": "object",
    "title": "Response Get Media Status Api V1 Media Status Get"
}

POST /api/v1/media/sessions

Create Media Session

Description Create a new media streaming session for Azure Communication Services.

Initializes a media session with specified audio configuration and returns WebSocket connection details for real-time audio streaming. This endpoint prepares the infrastructure for bidirectional media communication with configurable audio parameters.

Args: request: Media session configuration including call connection ID, audio format, sample rate, and streaming options.

Returns: MediaSessionResponse: Session details containing unique session ID, WebSocket URL for streaming, status, and audio configuration.

Raises: HTTPException: When session creation fails due to invalid configuration or system resource constraints.

Example: >>> request = MediaSessionRequest(call_connection_id="call_123") >>> response = await create_media_session(request) >>> print(response.websocket_url)

Request body

application/json

{
    "audio_format": "pcm_16",
    "call_connection_id": "call_12345",
    "channels": 1,
    "chunk_size": 1024,
    "enable_transcription": true,
    "enable_vad": true,
    "sample_rate": 16000
}
This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "properties": {
        "call_connection_id": {
            "type": "string",
            "title": "Call Connection Id",
            "description": "ACS call connection identifier",
            "example": "call_12345"
        },
        "sample_rate": {
            "anyOf": [
                {
                    "type": "integer"
                },
                {
                    "type": "null"
                }
            ],
            "title": "Sample Rate",
            "description": "Audio sample rate in Hz",
            "default": 16000,
            "example": 16000
        },
        "channels": {
            "anyOf": [
                {
                    "type": "integer"
                },
                {
                    "type": "null"
                }
            ],
            "title": "Channels",
            "description": "Number of audio channels",
            "default": 1,
            "example": 1
        },
        "audio_format": {
            "anyOf": [
                {
                    "type": "string"
                },
                {
                    "type": "null"
                }
            ],
            "title": "Audio Format",
            "description": "Audio format (pcm_16, pcm_24, opus, etc.)",
            "default": "pcm_16",
            "example": "pcm_16"
        },
        "chunk_size": {
            "anyOf": [
                {
                    "type": "integer"
                },
                {
                    "type": "null"
                }
            ],
            "title": "Chunk Size",
            "description": "Audio chunk size in bytes",
            "default": 1024,
            "example": 1024
        },
        "enable_transcription": {
            "anyOf": [
                {
                    "type": "boolean"
                },
                {
                    "type": "null"
                }
            ],
            "title": "Enable Transcription",
            "description": "Enable real-time transcription",
            "default": true,
            "example": true
        },
        "enable_vad": {
            "anyOf": [
                {
                    "type": "boolean"
                },
                {
                    "type": "null"
                }
            ],
            "title": "Enable Vad",
            "description": "Enable voice activity detection",
            "default": true,
            "example": true
        }
    },
    "type": "object",
    "required": [
        "call_connection_id"
    ],
    "title": "MediaSessionRequest",
    "description": "Request schema for starting a media session.",
    "example": {
        "audio_format": "pcm_16",
        "call_connection_id": "call_12345",
        "channels": 1,
        "chunk_size": 1024,
        "enable_transcription": true,
        "enable_vad": true,
        "sample_rate": 16000
    }
}

Response 200 OK

application/json

{
    "configuration": {
        "channels": 1,
        "chunk_size": 1024,
        "format": "pcm_16",
        "sample_rate": 16000
    },
    "created_at": "2025-08-10T13:45:00Z",
    "session_id": "media_session_123456",
    "status": "active",
    "websocket_url": "wss://api.example.com/v1/media/stream/media_session_123456"
}
This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "properties": {
        "session_id": {
            "type": "string",
            "title": "Session Id",
            "description": "Unique media session identifier",
            "example": "media_session_123456"
        },
        "websocket_url": {
            "type": "string",
            "title": "Websocket Url",
            "description": "WebSocket URL for audio streaming",
            "example": "wss://api.example.com/v1/media/stream/media_session_123456"
        },
        "status": {
            "type": "string",
            "title": "Status",
            "description": "Session status",
            "example": "active"
        },
        "created_at": {
            "type": "string",
            "title": "Created At",
            "description": "Session creation timestamp",
            "example": "2025-08-10T13:45:00Z"
        },
        "configuration": {
            "additionalProperties": true,
            "type": "object",
            "title": "Configuration",
            "description": "Session configuration settings",
            "example": {
                "channels": 1,
                "chunk_size": 1024,
                "format": "pcm_16",
                "sample_rate": 16000
            }
        }
    },
    "type": "object",
    "required": [
        "session_id",
        "websocket_url",
        "status",
        "created_at",
        "configuration"
    ],
    "title": "MediaSessionResponse",
    "description": "Response schema for media session creation.",
    "example": {
        "configuration": {
            "channels": 1,
            "chunk_size": 1024,
            "format": "pcm_16",
            "sample_rate": 16000
        },
        "created_at": "2025-08-10T13:45:00Z",
        "session_id": "media_session_123456",
        "status": "active",
        "websocket_url": "wss://api.example.com/v1/media/stream/media_session_123456"
    }
}

Response 422 Unprocessable Entity

application/json

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}
This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "type": "array",
            "title": "Detail"
        }
    },
    "type": "object",
    "title": "HTTPValidationError"
}

GET /api/v1/media/sessions/{session_id}

Get Media Session Status

Description Retrieve status and metadata for a specific media session.

Queries the current state of an active media session including connection status, WebSocket state, and session configuration details. Used for monitoring and debugging media streaming sessions.

Args: session_id: Unique identifier for the media session to query.

Returns: dict: Session information including status, connection state, creation timestamp, and API version details.

Example: >>> session_info = await get_media_session("media_session_123") >>> print(session_info["status"])

Input parameters

Parameter In Type Default Nullable Description
session_id path string No

Response 200 OK

application/json

Schema of the response body

{
    "type": "object",
    "additionalProperties": true,
    "title": "Response Get Media Session Api V1 Media Sessions  Session Id  Get"
}

Response 422 Unprocessable Entity

application/json

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}
This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "type": "array",
            "title": "Detail"
        }
    },
    "type": "object",
    "title": "HTTPValidationError"
}

Real-time Communication

GET /api/v1/realtime/status

Get Realtime Service Status

Description Get the current status of the realtime communication service.

Returns information about:
- Service availability and health
- Supported protocols and features
- Active connection counts
- WebSocket endpoint configurations

Response 200 OK

application/json

{
    "status": "available",
    "websocket_endpoints": {
        "dashboard_relay": "/api/v1/realtime/dashboard/relay",
        "conversation": "/api/v1/realtime/conversation"
    },
    "features": {
        "dashboard_broadcasting": true,
        "conversation_streaming": true,
        "orchestrator_support": true,
        "session_management": true
    },
    "active_connections": {
        "dashboard_clients": 0,
        "conversation_sessions": 0
    },
    "version": "v1"
}

Schema of the response body

{
    "properties": {
        "status": {
            "type": "string",
            "enum": [
                "available",
                "degraded",
                "unavailable"
            ],
            "title": "Status",
            "description": "Current service status",
            "example": "available"
        },
        "websocket_endpoints": {
            "additionalProperties": {
                "type": "string"
            },
            "type": "object",
            "title": "Websocket Endpoints",
            "description": "Available WebSocket endpoints",
            "example": {
                "conversation": "/api/v1/realtime/conversation",
                "dashboard_relay": "/api/v1/realtime/dashboard/relay"
            }
        },
        "features": {
            "additionalProperties": {
                "type": "boolean"
            },
            "type": "object",
            "title": "Features",
            "description": "Supported features and capabilities",
            "example": {
                "conversation_streaming": true,
                "dashboard_broadcasting": true,
                "orchestrator_support": true,
                "session_management": true
            }
        },
        "active_connections": {
            "additionalProperties": {
                "type": "integer"
            },
            "type": "object",
            "title": "Active Connections",
            "description": "Current active connection counts",
            "example": {
                "conversation_sessions": 0,
                "dashboard_clients": 0
            }
        },
        "protocols_supported": {
            "items": {
                "type": "string"
            },
            "type": "array",
            "title": "Protocols Supported",
            "description": "Supported communication protocols",
            "default": [
                "WebSocket"
            ],
            "example": [
                "WebSocket"
            ]
        },
        "version": {
            "type": "string",
            "title": "Version",
            "description": "API version",
            "default": "v1",
            "example": "v1"
        }
    },
    "type": "object",
    "required": [
        "status",
        "websocket_endpoints",
        "features",
        "active_connections"
    ],
    "title": "RealtimeStatusResponse",
    "description": "Response schema for realtime service status endpoint.\n\nProvides comprehensive information about the realtime communication\nservice including availability, features, and active connections."
}

Schemas

AgentConfigUpdate

Name Type
model
voice

AgentModelUpdate

Name Type
deployment_id
max_tokens
temperature
top_p

AgentVoiceUpdate

Name Type
voice_name
voice_style

CallInitiateRequest

Name Type
caller_id
context
target_number string

CallInitiateResponse

Name Type
call_id string
message string
status string
target_number string

CallListResponse

Name Type
calls Array<CallStatusResponse>
limit integer
page integer
total integer

CallStatusResponse

Name Type
call_id string
duration
events Array<>
participants Array<>
status string

HealthResponse

Name Type
active_sessions
details Example: {'api_version': 'v1', 'service': 'rtagent-backend'}
message string
session_metrics
status string
timestamp number
version string

HTTPValidationError

Name Type
detail Array<ValidationError>

MediaSessionRequest

Name Type
audio_format
call_connection_id string
channels
chunk_size
enable_transcription
enable_vad
sample_rate

MediaSessionResponse

Name Type
configuration Example: {'channels': 1, 'chunk_size': 1024, 'format': 'pcm_16', 'sample_rate': 16000}
created_at string
session_id string
status string
websocket_url string

ReadinessResponse

Name Type
checks Array<ServiceCheck>
event_system
response_time_ms number
status string
timestamp number

RealtimeStatusResponse

Name Type
active_connections Example: {'conversation_sessions': 0, 'dashboard_clients': 0}
features Example: {'conversation_streaming': True, 'dashboard_broadcasting': True, 'orchestrator_support': True, 'session_management': True}
protocols_supported Array<string>
status string
version string
websocket_endpoints Example: {'conversation': '/api/v1/realtime/conversation', 'dashboard_relay': '/api/v1/realtime/dashboard/relay'}

ServiceCheck

Name Type
check_time_ms number
component string
details
error
status string

ValidationError

Name Type
loc Array<>
msg string
type string

WebSocket Endpoints

The following WebSocket endpoints provide real-time communication capabilities:

Media Streaming WebSocket

URL: wss://api.domain.com/api/v1/media/stream

Real-time bidirectional audio streaming for Azure Communication Services calls following ACS WebSocket protocol.

Query Parameters: - call_connection_id (required): ACS call connection identifier - session_id (optional): Browser session ID for UI coordination

Audio Formats: - MEDIA/TRANSCRIPTION Mode: PCM 16kHz mono (16-bit) - VOICE_LIVE Mode: PCM 24kHz mono (24-bit) for Azure OpenAI Realtime API

Message Types:

// Incoming audio data
{
  "kind": "AudioData",
  "audioData": {
    "timestamp": "2025-09-28T12:00:00Z",
    "participantRawID": "8:acs:...",
    "data": "base64EncodedPCMAudio",
    "silent": false
  }
}

// Outgoing audio data (bidirectional streaming)
{
  "Kind": "AudioData",
  "AudioData": {
    "Data": "base64EncodedPCMAudio"
  }
}

Realtime Conversation WebSocket

URL: wss://api.domain.com/api/v1/realtime/conversation

Browser-based voice conversations with session persistence and real-time transcription.

Query Parameters: - session_id (optional): Conversation session identifier for session restoration

Features: - Real-time speech-to-text transcription - TTS audio streaming for responses - Conversation context persistence - Multi-language support

Dashboard Relay WebSocket

URL: wss://api.domain.com/api/v1/realtime/dashboard/relay

Real-time updates for dashboard clients monitoring ongoing conversations.

Query Parameters: - session_id (optional): Filter updates for specific conversation sessions

Use Cases: - Live call monitoring and analytics - Real-time transcript viewing - Agent performance dashboards

Authentication & Security

All endpoints support Azure Entra ID authentication using DefaultAzureCredential following Azure best practices.

Authentication Methods

Environment Variables (Recommended for production):

# Service Principal Authentication
export AZURE_CLIENT_ID="your-client-id"
export AZURE_CLIENT_SECRET="your-client-secret" 
export AZURE_TENANT_ID="your-tenant-id"

Azure CLI (Development):

az login

Managed Identity (Azure deployment): - System-assigned or user-assigned managed identity - No credential management required - Automatic token refresh

Required RBAC Roles

Grant these Azure roles to your service principal or managed identity:

Service Required Role Purpose
Azure Speech Services Cognitive Services User STT/TTS operations
Azure Cache for Redis Redis Cache Contributor Session state management
Azure Communication Services Communication Services Contributor Call automation and media streaming
Azure Storage Storage Blob Data Contributor Call recordings and artifacts
Azure OpenAI Cognitive Services OpenAI User AI model inference

Security Features

  • Credential-less authentication with managed identity
  • Connection pooling with automatic token refresh
  • TLS encryption for all HTTP/WebSocket connections
  • Input validation and request sanitization
  • Rate limiting per Azure service quotas

Configuration

Required Environment Variables

Azure Services Configuration:

# Azure Speech Services
AZURE_SPEECH_REGION=eastus
AZURE_SPEECH_RESOURCE_ID=/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{name}

# Azure Cache for Redis
AZURE_REDIS_HOSTNAME=your-redis.redis.cache.windows.net
AZURE_REDIS_USERNAME=default

# Azure Communication Services
ACS_ENDPOINT=https://your-acs.communication.azure.com

Application Configuration:

# Streaming Mode (affects audio processing pipeline)
ACS_STREAMING_MODE=MEDIA  # MEDIA | VOICE_LIVE | TRANSCRIPTION

# Optional Settings
AZURE_OPENAI_ENDPOINT=https://your-openai.openai.azure.com  # For AI features
AZURE_STORAGE_CONNECTION_STRING=...  # For call recordings

Streaming Mode Configuration

Controls the audio processing pipeline and determines handler selection:

Mode Description Audio Format Use Case
MEDIA Default STT/TTS pipeline PCM 16kHz mono Traditional phone calls with AI orchestration
VOICE_LIVE Azure OpenAI Realtime API PCM 24kHz mono Advanced conversational AI
TRANSCRIPTION Real-time transcription only PCM 16kHz mono Call recording and analysis

📖 Reference: Complete streaming modes documentation

Performance Tuning

Connection Pools (optional):

# Speech service connection limits
MAX_STT_POOL_SIZE=4
MAX_TTS_POOL_SIZE=4

# Redis connection pool
REDIS_MAX_CONNECTIONS=20
REDIS_CONNECTION_TIMEOUT=5

Audio Processing:

# Voice Activity Detection (VAD) settings
VAD_TIMEOUT_MS=2000  # Silence timeout
VAD_SENSITIVITY=medium  # low | medium | high

# Barge-in detection
BARGE_IN_ENABLED=true
BARGE_IN_THRESHOLD_MS=10  # Response time for interruption

Error Handling

Standard Error Response Format

All endpoints return consistent error responses following RFC 7807:

{
  "detail": "Human-readable error description",
  "status_code": 400,
  "timestamp": "2025-09-28T12:00:00Z",
  "type": "validation_error",
  "instance": "/api/v1/calls/initiate",
  "errors": [
    {
      "field": "phone_number",
      "message": "Invalid phone number format",
      "code": "format_invalid"
    }
  ]
}

HTTP Status Codes

Status Description Common Causes
200 Success Request completed successfully
202 Accepted Async operation initiated
400 Bad Request Invalid request format or parameters
401 Unauthorized Missing or invalid authentication
403 Forbidden Insufficient permissions or RBAC roles
404 Not Found Resource not found
422 Validation Error Request body schema validation failed
429 Rate Limited Azure service quota exceeded
500 Internal Server Error Unexpected server error
502 Bad Gateway Azure service unavailable
503 Service Unavailable Dependencies not ready
504 Gateway Timeout Azure service timeout

Service-Specific Errors

Azure Speech Services: - speech_quota_exceeded - API rate limit reached - speech_region_unavailable - Speech service region down - audio_format_unsupported - Invalid audio format specified

Azure Communication Services: - call_not_found - Call connection ID invalid - media_streaming_failed - WebSocket streaming error - pstn_number_invalid - Phone number format error

Azure Cache for Redis: - redis_connection_failed - Redis cluster unavailable - session_expired - Session data TTL exceeded

Retry Strategy

The API implements exponential backoff for transient errors:

# Retry configuration
RETRY_MAX_ATTEMPTS=3
RETRY_BACKOFF_FACTOR=2.0
RETRY_JITTER=true

# Service-specific timeouts
SPEECH_REQUEST_TIMEOUT=30
ACS_CALL_TIMEOUT=60
REDIS_OPERATION_TIMEOUT=5

📖 Reference: Azure Service reliability patterns

Getting Started

Quick Setup

  1. Configure Authentication:

    export AZURE_TENANT_ID="your-tenant-id"
    export AZURE_CLIENT_ID="your-client-id"
    export AZURE_CLIENT_SECRET="your-client-secret"
    

  2. Set Required Environment Variables:

    export AZURE_SPEECH_REGION="eastus"
    export ACS_ENDPOINT="https://your-acs.communication.azure.com"
    export AZURE_REDIS_HOSTNAME="your-redis.redis.cache.windows.net"
    

  3. Test Health Endpoint:

    curl -X GET https://api.domain.com/api/v1/health/
    

  4. Initiate a Test Call:

    curl -X POST https://api.domain.com/api/v1/calls/initiate \
      -H "Content-Type: application/json" \
      -d '{"phone_number": "+1234567890"}'
    

Development Resources

Production Considerations

  • Use managed identity authentication in Azure deployments
  • Configure connection pooling for high-throughput scenarios
  • Enable distributed tracing with Azure Monitor integration
  • Implement health checks for all dependent services
  • Set up monitoring and alerting for service reliability

📖 Reference: Production deployment guide