Load Testing

Load Testing¶

Comprehensive WebSocket load testing framework for real-time voice agents using Locust with realistic conversation simulation.

Overview¶

The load testing framework validates WebSocket performance under realistic conversation scenarios:

Feature	Description
Locust-based	Python WebSocket simulation with real audio streaming
Audio generation	Production TTS-generated conversation audio (PCM)
Azure integration	Seamless deployment to Azure Load Testing service
Realistic scenarios	Multi-turn conversation patterns with barge-in
Rate limit tracking	Automatic detection of Azure OpenAI and Speech throttling

Quick Start¶

Prerequisites¶

Ensure your environment is configured:

# Verify uv is available
uv --version

# Ensure backend is running
make start_backend

Generate Test Audio¶

Audio files are automatically generated if not present, but you can pre-generate them:

make generate_audio

Run Load Tests¶

ACS Media StreamingBrowser Conversation

# Default: 15 users, 90 seconds, localhost:8010
make run_load_test_acs_media

# Custom configuration
make run_load_test_acs_media \
  HOST=your-app.azurecontainerapps.io \
  USERS=50 \
  SPAWN_RATE=5 \
  TIME=5m

# Default: 15 users, 90 seconds, localhost:8010 (Custom Cascade)
make run_load_test_browser_conversation

# Use VoiceLive SDK pipeline instead
make run_load_test_browser_conversation PIPELINE=voicelive

# Custom configuration
make run_load_test_browser_conversation \
  HOST=your-app.azurecontainerapps.io \
  USERS=50 \
  SPAWN_RATE=5 \
  TIME=5m \
  PIPELINE=voicelive

Test Targets¶

Two Locust test files target different WebSocket endpoints:

Test	Endpoint	Use Case
`locustfile.acs_media.py`	`/api/v1/media/stream`	ACS Call Automation (phone calls via Azure Communication Services)
`locustfile.browser_conversation.py`	`/api/v1/realtime/conversation`	Browser-based voice conversations

Pipeline Selection¶

Both test files support two orchestration pipelines controlled via PIPELINE environment variable:

Pipeline	Value	Description
Custom Cascade	`cascade` (default)	Azure Speech STT → Azure OpenAI Chat → Azure Speech TTS
VoiceLive SDK	`voicelive`	Azure OpenAI Realtime API with native audio input/output

Same Orchestrator, Different Transports

Both ACS and Browser routes use the same Custom Cascade Orchestrator for speech processing. The difference is the transport layer: ACS uses Azure Communication Services media streaming, while Browser uses direct WebSocket from the client.

Audio Generation¶

Production TTS Integration¶

The audio generator uses Azure Speech Services to create realistic conversation audio:

synthesizer = SpeechSynthesizer(
    region=os.getenv("AZURE_SPEECH_REGION"),
    key=os.getenv("AZURE_SPEECH_KEY"),
    language="en-US",
    voice="en-US-JennyMultilingualNeural",
    playback="never",
    enable_tracing=False
)

Generated Structure¶

tests/load/audio_cache/
├── turn_001.pcm
├── turn_002.pcm
├── turn_003.pcm
├── turn_004.pcm
├── turn_005.pcm
└── manifest.jsonl

Audio Format¶

Property	Value	Description
Format	16-bit PCM	Raw audio compatible with WebSocket streaming
Sample Rate	16 kHz	Optimized for speech recognition
Channels	Mono	Single channel for conversation
Chunk Size	640 bytes	20ms frames at 16kHz/16-bit

Metric Naming Convention¶

The load tests use consistent, short metric names for clean table output:

Metric	Description
`ttfb/{N}`	Time-to-first-byte for turn N (zero-padded)
`barge/{N}`	Barge-in latency for turn N
`turn/{N}`	Total turn completion time (realtime only)
`ws/close`	WebSocket closed (benign if `WS_IGNORE_CLOSE_EXCEPTIONS=true`)
`ws/error`	WebSocket connection error
`turn/error`	Turn processing error
`rate/openai`	Azure OpenAI rate limit detected (429)
`rate/speech`	Azure Speech rate limit detected
`rate/unknown`	Unknown rate limit source
`err/{code}`	Specific error code from server

Sample Output¶

Name          # reqs   Avg   Min   Max   Fails
───────────────────────────────────────────────
ttfb/01           15  1234   500  3000   2
ttfb/02           15  1156   450  2800   1
barge/01          15  8765  2000 15000   3
barge/02          15  7890  1800 14000   2
rate/openai        5     0     0     0   5
───────────────────────────────────────────────

Rate Limit Summary¶

At the end of each test run, a summary is printed:

============================================================
RATE LIMIT SUMMARY
============================================================
  openai              :    15 occurrences
  speech              :     3 occurrences
============================================================

Configuration Reference¶

Makefile Parameters¶

Parameter	Default	Description
`HOST`	`localhost:8010`	Target host (without protocol)
`USERS`	`15`	Number of concurrent simulated users
`SPAWN_RATE`	`2`	Users spawned per second
`TIME`	`90s`	Test duration
`PIPELINE`	`cascade`	Orchestration pipeline (`cascade` or `voicelive`)
`EXTRA_ARGS`	(empty)	Additional Locust arguments

Environment Variables¶

Configure test behavior via environment variables:

# Pipeline selection (cascade or voicelive)
export PIPELINE="cascade"  # Default: Custom Cascade (STT→LLM→TTS)
# export PIPELINE="voicelive"  # VoiceLive SDK (OpenAI Realtime API)

# Connection
export WS_URL="ws://localhost:8010/api/v1/media/stream"
export PCM_DIR="tests/load/audio_cache"

# Turn configuration
export TURNS_PER_USER=60          # Turns per user session
export CHUNKS_PER_TURN=100        # Audio chunks per turn (~2s at 20ms)
export CHUNK_MS=20                # Milliseconds per audio chunk

# Timing thresholds
export FIRST_BYTE_TIMEOUT_SEC=5.0 # Max wait for first server response
export BARGE_QUIET_MS=400         # Silence gap to detect response end
export TURN_TIMEOUT_SEC=15.0      # Max wait for turn completion
export PAUSE_BETWEEN_TURNS_SEC=1.5

# Error handling
export WS_IGNORE_CLOSE_EXCEPTIONS=true
export WS_MAX_SSL_FAILS=4
export WS_RECONNECT_BACKOFF_BASE_SEC=0.2

Detection Tokens¶

Control what server messages indicate response completion:

export RESPONSE_TOKENS="recognizer,greeting,response,transcript,result"
export END_TOKENS="final,end,completed,stopped,barge"

Test Architecture¶

WebSocket Simulation Flow¶

sequenceDiagram participant L as Locust User participant WS as WebSocket Server participant STT as Speech-to-Text participant LLM as Azure OpenAI participant TTS as Text-to-Speech L->>WS: Connect + AudioMetadata loop Per Turn L->>WS: Stream PCM chunks (20ms intervals) L->>WS: Silence frames (VAD trigger) WS->>STT: Audio buffer STT-->>WS: Transcript WS->>LLM: Generate response LLM-->>WS: Streaming tokens WS->>TTS: Synthesize audio TTS-->>WS: Audio stream WS-->>L: Response audio Note over L,WS: TTFB measured here L->>WS: Barge-in audio Note over L,WS: Barge latency measured end L->>WS: Close connection

Rate Limit Detection¶

The tests automatically detect and track Azure service rate limits:

Azure OpenAIAzure Speech

Detected patterns:

RateLimitExceeded
429 status codes
too many requests
tokens per minute
requests per minute

Detected patterns:

quota exceeded
throttl (throttle/throttled/throttling)
speech + limit

Performance Targets¶

Latency Benchmarks¶

Metric	Target	Acceptable	Alert
TTFB P95	< 2000ms	< 3000ms	> 4000ms
Barge-in P95	< 500ms	< 1000ms	> 2000ms
Connection Success	> 98%	> 95%	< 90%
Turn Success	> 95%	> 90%	< 85%

Capacity Targets by Environment¶

Environment	Users	Duration	Success Rate
Development	10	5 min	> 95%
Staging	100	30 min	> 95%
Production	500+	60 min	> 98%

Azure Load Testing Integration¶

Overview¶

Azure Load Testing provides a fully managed service for running Locust tests at scale.

Setup Steps¶

1. Create Azure Load Testing Resource¶

az load create \
  --name "voice-agent-loadtest" \
  --resource-group "rg-voice-agent" \
  --location "eastus"

2. Upload Test Files¶

Upload to Azure Load Testing:

tests/load/locustfile.acs_media.py
All PCM files from tests/load/audio_cache/

3. Configure Environment¶

Set environment variables in Azure Load Testing:

WS_URL=wss://your-backend.azurecontainerapps.io/api/v1/media/stream
PCM_DIR=./
TURNS_PER_USER=3
WS_IGNORE_CLOSE_EXCEPTIONS=true

4. Scale Configuration¶

Environment	Virtual Users	Spawn Rate	Duration	Engines
Development	5-10	1-2/sec	5-10 min	1-2
Staging	50-100	5-10/sec	15-30 min	3-5
Production	200-500	20-50/sec	30-60 min	5-10

Documentation

Azure Load Testing Quickstart

Troubleshooting¶

Common Issues¶

No Audio FilesConnection RefusedSSL ErrorsRate Limits

# Generate audio files
make generate_audio

# Verify files exist
ls -la tests/load/audio_cache/

# Ensure backend is running
make start_backend

# Test WebSocket connectivity
curl -i -N \
  -H "Connection: Upgrade" \
  -H "Upgrade: websocket" \
  -H "Sec-WebSocket-Key: test" \
  -H "Sec-WebSocket-Version: 13" \
  http://localhost:8010/api/v1/media/stream

# For local testing, use ws:// not wss://
make run_load_test_acs_media HOST=localhost:8010

# For Azure, ensure TLS is configured
make run_load_test_acs_media HOST=your-app.azurecontainerapps.io

Check the rate limit summary at the end of the test. If you see many rate limit events:

Azure OpenAI: Increase TPM quota or add retry logic
Azure Speech: Check concurrent connection limits
Unknown: Review server logs for throttling source

Debug Mode¶

# Run single user with verbose output
locust -f tests/load/locustfile.acs_media.py \
  --host=ws://localhost:8010/api/v1/media/stream \
  --users 1 \
  --spawn-rate 1 \
  --run-time 60s \
  --headless \
  --loglevel DEBUG

CI/CD Integration¶

GitHub Actions Example¶

name: Load Test

on:
  workflow_dispatch:
    inputs:
      users:
        description: 'Number of concurrent users'
        default: '20'
      duration:
        description: 'Test duration'
        default: '5m'

jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install locust websocket-client certifi

      - name: Generate audio files
        run: make generate_audio
        env:
          AZURE_SPEECH_REGION: ${{ secrets.AZURE_SPEECH_REGION }}
          AZURE_SPEECH_KEY: ${{ secrets.AZURE_SPEECH_KEY }}

      - name: Run load test
        run: |
          make run_load_test_acs_media \
            HOST=${{ secrets.STAGING_HOST }} \
            USERS=${{ inputs.users }} \
            TIME=${{ inputs.duration }}

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: load-test-results
          path: locust_report_*.csv

Resources¶

References