Responsible AI — content filter, groundedness, HITL, red-team¶
What ships in v1 and where it lives. Customisation within these rails is fine; divergence needs a documented rationale in the partner's engagement repo.
Principle 1 — Safety at the model boundary¶
- Content filter locked at strict default on every deployment.
infra/modules/foundry.bicepsets theaccelerator-default-policyRAI policy (Hate / Sexual / Violence / Self-harm at Medium).scripts/accelerator-lint.py::content_filter_iac_onlyblocks any portal-only relaxation. - Jailbreak-resistant system instructions authored in
docs/agent-specs/<foundry_name>.md.src/bootstrap.pysyncs each spec verbatim to the Foundry portal at FastAPI startup on every deploy. Source code references agents by Foundry agent name (e.g.accel-sales-research-supervisor); instructions never appear inline in Python. - Prompt-injection surface scoped at retrieval.
src/retrieval/ai_search.pyonly returns fields declared in the scenario retrieval schema; the red-team suite probes data exfiltration via response shaping and tool abuse.
Principle 2 — Grounded responses with attribution¶
- Every retrieval source declared in
accelerator.yamlundersolution.grounding_sourcesand in the scenario retrieval module (src/scenarios/<scenario>/retrieval.py). - Grounding metadata surfaced to the user — workers emit
citationsarrays in their structured output; the quality eval runner enforcesmust_citeon cases that declare it. - Groundedness threshold is a CI gate.
accelerator.yaml.acceptance.groundedness_threshold(default 0.85) feeds.github/workflows/evals.yml; regressions below threshold block merge. - Retrieval runs under the app's managed identity.
src/retrieval/ai_search.pyusesDefaultAzureCredential(), not caller identity. Partners requiring caller-identity retrieval (customer-Entra ACL pass-through) must extend the retrieval glue in their engagement repo and document the change — the accelerator does not ship this pattern in v1.
Principle 3 — Human in the loop for side-effects¶
- Every side-effect tool declares a HITL checkpoint. Each tool module in
src/tools/declares anHITL_POLICYconstant at module scope and callshitl.checkpoint(...)before executing the side effect. The lint rulescripts/accelerator-lint.py::side_effect_tools_call_hitlwalkssrc/tools/*.pyand blocks any module that has the side-effect shape but is missing either piece. Theaccelerator.yaml.solution.hitlvalue is engagement-level documentation (e.g.none,pre-exec,post-exec) — the lint reads it but the runtime gate is per-tool. - HITL pattern is partner-scope. The flagship scenario does not ship a HITL queue UI — partners wire approval flow into their engagement app (Logic Apps, Teams card, ticketing system) and the tool blocks until approval returns.
- Default checkpoint is pre-execution. Reversible / telemetry-first variants require explicit partner + customer sign-off and heightened telemetry.
Principle 4 — Red-team evals in CI¶
- Default red-team suite:
evals/redteam/cases.jsonlcovers direct + indirect prompt injection, jailbreak attempts, tool abuse, data exfiltration via response shaping. redteam_must_pass: trueinaccelerator.yaml.acceptanceis zero-tolerance — any failure blocks CI.- Runner is scenario-agnostic.
evals/redteam/run.pyreadsaccelerator.yaml, forwards the partner-defined payload, and injects into fields declared in each case'sinject_intolist. - Partners extend with domain-specific probes in
evals/redteam/cases.jsonl. - Re-run triggers: scenario change, model change, tool-catalog change, Foundry instruction change.
Principle 5 — Data classification discipline¶
- Partner declares data classification per grounding source in
accelerator.yaml.solution.grounding_sources. - Tool catalog (
src/tools/) should restrict destinations by classification — partners enforce this in the tool body; the accelerator ships example tools (crm_write_contact,send_email,web_search) as scaffolding. - Customer signs off on classification. Partners must not decide unilaterally.
Principle 6 — RAI Impact Assessment¶
- Partner-scope. The customer engagement repo (not this template) holds the RAI IA under
rai/impact-assessment.mdwith use case, stakeholders, harms inventory, mitigations, data flows, classification map, and residual-risk acknowledgement signed by a named customer role. - Expires in 180 days. Material changes (model swap, tool add, grounding-source add) expire it sooner.
- Accelerator surfaces the hook via
accelerator.yaml.acceptance— partners layer the IA check into their PR gates.
Principle 7 — Transparency¶
- Users know they're interacting with AI. Partner front-end discloses this; the accelerator ships the API surface, not a first-party UI.
- Uncertainty is communicated. Worker outputs express confidence where applicable; eval suites probe for confident-wrong.
- Source attribution visible on grounded responses (see Principle 2).
Anti-patterns¶
| Anti-pattern | Why it's bad |
|---|---|
| "We disabled the content filter to reduce false positives" | Exits support scope and makes the system unsafe. Use allow-lists at the app layer instead. |
| "We skipped HITL for a low-risk side-effect" | "Low-risk" is a design claim until evals + production data prove it. HITL stays. |
| "Groundedness threshold = 0.5 because eval is too strict" | Lower threshold ships more hallucination. Fix the eval; do not mask the risk. |
| "Red-team evals are too flaky; we skip them in CI" | Flakiness signals real vulnerabilities. Fix the flake. |
| Agent instructions containing customer PII or secrets | Foundry portal snapshot + log trail exposes them. Instructions reference, do not embed. |
Where this lives in the repo¶
- Content filter + model deployment:
infra/modules/foundry.bicep - Retrieval:
src/retrieval/ai_search.py - HITL lint contract:
scripts/accelerator-lint.py::tool_registers_hitl - Red-team suite + runner:
evals/redteam/ - Acceptance gates:
accelerator.yaml.acceptance+src/accelerator_baseline/evals.py