Backend voice agents architecture

Backend Voice & Agents Architecture¶

This document describes the architecture of the apps/artagent/backend/ modules, specifically the separation of concerns between voice/ (transport & orchestration) and agents/ (configuration & business logic).

High-Level Overview¶

                                    External Calls (ACS/WebSocket)
                                              │
                                              ▼
┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                    backend/                                          │
├──────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                      │
│  ┌──────────────────────────────────────────┐    ┌─────────────────────────────────┐ │
│  │              voice/                      │    │            agents/              │ │
│  │     (Transport & Orchestration)          │    │     (Config & Business Logic)   │ │
│  │                                          │    │                                 │ │
│  │  ┌────────────────────────────────────┐  │    │  ┌───────────────────────────┐  │ │
│  │  │ voicelive/                         │  │    │  │ Agent Definitions         │  │ │
│  │  │   ├─ handler.py     (SDK bridge)   │  │    │  │   ├─ concierge/           │  │ │
│  │  │   ├─ agent_adapter.py              │  │    │  │   │   ├─ agent.yaml       │  │ │
│  │  │   ├─ session_loader.py             │  │    │  │   │   └─ prompt.jinja     │  │ │
│  │  │   └─ metrics.py                    │  │    │  │   ├─ fraud_agent/         │  │ │
│  │  └────────────────────────────────────┘  │    │  │   ├─ investment_advisor/  │  │ │
│  │                 │                        │    │  │   ├─ compliance_desk/     │  │ │
│  │                 ▼                        │    │  │   └─ ...                  │  │ │
│  │  ┌────────────────────────────────────┐  │    │  └───────────────────────────┘  │ │
│  │  │ speech_cascade/                    │  │    │                                 │ │
│  │  │   ├─ handler.py  (STT→LLM→TTS)     │  │    │  ┌───────────────────────────┐  │ │
│  │  │   └─ metrics.py                    │  │    │  │ tools/                    │  │ │
│  │  └────────────────────────────────────┘  │    │  │   ├─ registry.py          │  │ │
│  │                 │                        │    │  │   │   ├─ register_tool()  │  │ │
│  │                 ▼                        │    │  │   │   ├─ execute_tool()   │  │ │
│  │  ┌────────────────────────────────────┐  │    │  │   │   └─ is_handoff_tool()│  │ │
│  │  │ orchestrators/                     │  │    │  │   ├─ handoffs.py          │  │ │
│  │  │   ├─ live_orchestrator.py ─────────┼──┼────┼──┤   ├─ banking.py           │  │ │
│  │  │   │     (VoiceLive multi-agent)    │  │    │  │   ├─ fraud.py             │  │ │
│  │  │   ├─ cascade_adapter.py ───────────┼──┼────┼──┤   └─ ...                  │  │ │
│  │  │   │     (Cascade multi-agent)      │  │    │  └───────────────────────────┘  │ │
│  │  │   └─ config_resolver.py            │  │    │                                 │ │
│  │  │         (scenario-aware config)    │  │    │  ┌───────────────────────────┐  │ │
│  │  └────────────────────────────────────┘  │    │  │ Core Modules              │  │ │
│  │                 │                        │    │  │   ├─ base.py              │  │ │
│  │                 ▼                        │    │  │   │   └─ UnifiedAgent     │  │ │
│  │  ┌────────────────────────────────────┐  │    │  │   ├─ loader.py            │  │ │
│  │  │ handoffs/                          │  │    │  │   │   ├─ discover_agents()│  │ │
│  │  │   └─ context.py                    │  │    │  │   │   └─ build_handoff_   │  │ │
│  │  │       ├─ HandoffContext            │  │    │  │   │         map()         │  │ │
│  │  │       ├─ HandoffResult             │  │    │  │   └─ session_manager.py   │  │ │
│  │  │       ├─ build_handoff_system_vars │  │    │  │       ├─ SessionAgentMgr  │  │ │
│  │  │       └─ sanitize_handoff_context  │  │    │  │       └─ HandoffProvider  │  │ │
│  │  └────────────────────────────────────┘  │    │  └───────────────────────────┘  │ │
│  │                                          │    │                                 │ │
│  └──────────────────────────────────────────┘    └─────────────────────────────────┘ │
│                                                                                      │
│                               Data Flow                                              │
│  ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│  │  discover_agents() ──► UnifiedAgent dict ──► build_handoff_map() ──► handoff_map│ │
│  │                              │                                           │      │ │
│  │                              ▼                                           ▼      │ │
│  │                    Orchestrators use agents          HandoffProvider lookups    │ │
│  └─────────────────────────────────────────────────────────────────────────────────┘ │
│                                                                                      │
└──────────────────────────────────────────────────────────────────────────────────────┘

Module Responsibilities¶

`backend/agents/` — Configuration & Business Logic¶

Purpose: Define WHAT agents do (prompts, tools, handoffs) without knowing HOW they'll be invoked.

Component	Responsibility
`base.py`	`UnifiedAgent` dataclass — orchestrator-agnostic agent definition
`loader.py`	Auto-discovers agents from folder structure, builds `handoff_map`
`session_manager.py`	Session-scoped agent overrides, `HandoffProvider` protocol
`tools/registry.py`	Central tool registry with schemas and executors
`tools/*.py`	Tool implementations (banking, handoffs, fraud, etc.)
`{agent}/agent.yaml`	Per-agent configuration (prompt, tools, voice, handoff trigger)
`{agent}/prompt.jinja`	Agent's system prompt template
`scenarios/`	Scenario-based agent overrides for demo configurations

Key Abstractions: - UnifiedAgent: Orchestrator-agnostic agent configuration - HandoffConfig: Defines how to reach an agent (trigger tool name) - HandoffProvider protocol: Session-aware handoff target resolution - Tool Registry: register_tool(), execute_tool(), is_handoff_tool()

`backend/voice/` — Transport & Orchestration¶

Purpose: Define HOW agents are invoked (WebSocket handling, audio streaming, multi-agent switching).

Component	Responsibility
`voicelive/handler.py`	VoiceLive SDK WebSocket handler, audio streaming
`voicelive/agent_adapter.py`	Adapts `UnifiedAgent` to VoiceLive session format
`speech_cascade/handler.py`	Three-thread architecture for STT→LLM→TTS pipeline
`orchestrators/live_orchestrator.py`	Multi-agent switching for VoiceLive (real-time)
`orchestrators/cascade_adapter.py`	Multi-agent switching for SpeechCascade (turn-based)
`orchestrators/config_resolver.py`	Scenario-aware agent/handoff resolution
`handoffs/context.py`	Shared handoff context builders and dataclasses

Key Abstractions: - LiveOrchestrator: Event-driven orchestration for VoiceLive - CascadeOrchestratorAdapter: Turn-based orchestration for SpeechCascade - OrchestratorContext / OrchestratorResult: Shared data structures - HandoffContext / HandoffResult: Handoff execution data

Separation of Concerns¶

✅ Clean Boundaries¶

Concern	Owned By	NOT Owned By
Agent prompts & tools	`agents/`	`voice/`
Handoff tool definitions	`agents/tools/handoffs.py`	`voice/`
Handoff target resolution	`agents/session_manager.py`	-
Tool execution	`agents/tools/registry.py`	`voice/`
Audio streaming	`voice/voicelive/`	`agents/`
Turn processing	`voice/speech_cascade/`	`agents/`
Multi-agent switching	`voice/orchestrators/`	`agents/`
Handoff context building	`voice/handoffs/context.py`	`agents/`

Key Design Decisions¶

Agents are orchestrator-agnostic: UnifiedAgent works with both VoiceLive and SpeechCascade
Tools are centralized: All tools registered in agents/tools/registry.py
Handoff routing via handoff_map: Built from agent YAML declarations
is_handoff_tool() from registry: Single source for "is this a handoff tool type?"
HandoffProvider for live lookups: Session-aware handoff target resolution

Data Flow¶

VoiceLive Path (Real-Time)¶

External Call (ACS)
       │
       ▼
┌─────────────────────────────────────────────────────────────┐
│  VoiceLiveSDKHandler (voice/voicelive/handler.py)           │
│    - WebSocket connection to VoiceLive SDK                  │
│    - Audio streaming (PCM16)                                │
│    - Session management                                     │
└─────────────────────────────────────────────────────────────┘
       │
       ▼
┌─────────────────────────────────────────────────────────────┐
│  LiveOrchestrator (voice/orchestrators/live_orchestrator.py)│
│    - Event handling (tool calls, transcripts)               │
│    - Multi-agent switching via handoff tools                │
│    - Tool execution via registry                            │
└─────────────────────────────────────────────────────────────┘
       │
       ├──────────────────────────────────────────────────────┐
       ▼                                                      │
┌─────────────────────────────────┐    ┌─────────────────────────────────┐
│  VoiceLiveAgentAdapter          │    │  Tool Registry                  │
│  (voice/voicelive/agent_adapter)│    │  (agents/tools/registry.py)     │
│    - UnifiedAgent → session     │    │    - execute_tool()             │
│    - apply_session() to SDK     │    │    - is_handoff_tool()          │
└─────────────────────────────────┘    └─────────────────────────────────┘
       │
       ▼
┌─────────────────────────────────┐
│  UnifiedAgent                   │
│  (agents/base.py)               │
│    - Prompt template            │
│    - Tool list                  │
│    - Handoff config             │
└─────────────────────────────────┘

SpeechCascade Path (Turn-Based)¶

External Call (ACS)
       │
       ▼
┌─────────────────────────────────────────────────────────────┐
│  SpeechCascadeHandler (voice/speech_cascade/handler.py)     │
│    - Three-thread architecture                              │
│    - STT → LLM → TTS pipeline                               │
│    - Barge-in handling                                      │
└─────────────────────────────────────────────────────────────┘
       │
       ▼
┌─────────────────────────────────────────────────────────────┐
│  CascadeOrchestratorAdapter (voice/orchestrators/cascade_adapter.py) │
│    - Turn-based processing                                  │
│    - Multi-agent switching                                  │
│    - Tool execution via registry                            │
└─────────────────────────────────────────────────────────────┘
       │
       ├──────────────────────────────────────────────────────┐
       ▼                                                      │
┌─────────────────────────────────┐    ┌─────────────────────────────────┐
│  UnifiedAgent                   │    │  Tool Registry                  │
│  (agents/base.py)               │    │  (agents/tools/registry.py)     │
│    - render_prompt()            │    │    - execute_tool()             │
│    - get_tools()                │    │    - is_handoff_tool()          │
└─────────────────────────────────┘    └─────────────────────────────────┘

Handoff Flow¶

1. LLM calls handoff tool (e.g., "handoff_fraud_agent")
       │
       ▼
2. Orchestrator detects handoff
   ┌─────────────────────────────────────────────────────────────────┐
   │  if is_handoff_tool(tool_name):  # from agents/tools/registry   │
   │      target = get_handoff_target(tool_name)  # from handoff_map │
   └─────────────────────────────────────────────────────────────────┘
       │
       ▼
3. Build handoff context (voice/handoffs/context.py)
   ┌─────────────────────────────────────────────────────────────────┐
   │  system_vars = build_handoff_system_vars(                       │
   │      source_agent="Concierge",                                  │
   │      target_agent="FraudAgent",                                 │
   │      tool_result={...},                                         │
   │      tool_args={...},                                           │
   │      current_system_vars={...},                                 │
   │  )                                                              │
   └─────────────────────────────────────────────────────────────────┘
       │
       ▼
4. Switch to target agent
   ┌─────────────────────────────────────────────────────────────────┐
   │  # VoiceLive: agent.apply_session(conn, system_vars)            │
   │  # Cascade: set _active_agent, render new prompt                │
   └─────────────────────────────────────────────────────────────────┘
       │
       ▼
5. Target agent responds with greeting

Key Protocols¶

`HandoffProvider` (agents/session_manager.py)¶

class HandoffProvider(Protocol):
    """Protocol for session-aware handoff resolution."""

    def get_handoff_target(self, tool_name: str) -> Optional[str]:
        """Get target agent for a handoff tool."""
        ...

    @property
    def handoff_map(self) -> Dict[str, str]:
        """Get current handoff mappings."""
        ...

    def is_handoff_tool(self, tool_name: str) -> bool:
        """Check if a tool triggers a handoff (session-aware)."""
        ...

Implementations: - SessionAgentManager: Per-session handoff configuration with runtime updates

Consumers: - LiveOrchestrator: Uses HandoffProvider for live lookups - CascadeOrchestratorAdapter: Uses HandoffProvider or static handoff_map

File Inventory¶

`backend/voice/`¶

voice/
├── __init__.py
├── handoffs/
│   ├── __init__.py          # Exports HandoffContext, HandoffResult, helpers
│   └── context.py           # Dataclasses + build_handoff_system_vars()
├── messaging/               # WebSocket message helpers
├── orchestrators/
│   ├── __init__.py          # Exports LiveOrchestrator, CascadeOrchestratorAdapter
│   ├── base.py              # OrchestratorContext, OrchestratorResult
│   ├── cascade_adapter.py   # SpeechCascade orchestration
│   ├── config_resolver.py   # Scenario-aware config resolution
│   └── live_orchestrator.py # VoiceLive orchestration
├── speech_cascade/
│   ├── handler.py           # Three-thread STT→LLM→TTS handler
│   ├── metrics.py           # Latency metrics
│   └── orchestrator.py      # Legacy orchestrator (deprecated)
└── voicelive/
    ├── agent_adapter.py     # UnifiedAgent → VoiceLive adapter
    ├── handler.py           # VoiceLive SDK WebSocket handler
    ├── metrics.py           # Latency metrics
    ├── session_loader.py    # User profile loading
    ├── settings.py          # VoiceLive settings
    └── tool_helpers.py      # Tool notification helpers

`backend/agents/`¶

agents/
├── __init__.py
├── _defaults.yaml           # Default agent configuration
├── base.py                  # UnifiedAgent, HandoffConfig, VoiceConfig, ModelConfig
├── loader.py                # discover_agents(), build_handoff_map()
├── session_manager.py       # SessionAgentManager, HandoffProvider protocol
├── scenarios/               # Scenario-based overrides
│   └── loader.py
├── tools/
│   ├── __init__.py          # Tool initialization and exports
│   ├── registry.py          # Tool registration and execution
│   ├── handoffs.py          # Handoff tool implementations
│   ├── banking.py           # Banking tools
│   ├── fraud.py             # Fraud detection tools
│   └── ...                  # Other tool modules
└── {agent_name}/
    ├── agent.yaml           # Agent configuration
    └── prompt.jinja         # System prompt template

Summary¶

The architecture cleanly separates:

Business Logic (agents/): What agents do, their prompts, tools, and handoff triggers
Transport (voice/): How agents are invoked via VoiceLive or SpeechCascade
Orchestration (voice/orchestrators/): Multi-agent switching and tool execution

Both orchestrators (LiveOrchestrator, CascadeOrchestratorAdapter) use: - is_handoff_tool(name) from the tool registry for handoff detection - HandoffProvider.get_handoff_target(name) or handoff_map for target resolution - build_handoff_system_vars() from voice/handoffs/context.py for context building

This enables orchestrator-agnostic agents that work with any voice transport layer.

Recent Cleanup (Phases 1-6)¶

The handoff system was simplified through six phases documented in handoff-inventory.md:

Phase	Change	Impact
1	Removed unused strategy patterns	~600 lines deleted
2	Added `HandoffProvider` protocol	Live handoff lookups
3	Extracted shared handoff context builder	~25 lines reduced
4	Removed unused `HandoffStrategy` enum	~60 lines deleted
5	Consolidated `is_handoff_tool()` to registry	Single source of truth
6	Added `HandoffProvider` to CascadeAdapter	Session-aware handoffs

Total lines removed: ~690

Backend voice agents architecture