3. Rehearse in a sandbox¶
Step 3 of 10 ยท Get ready
Step at a glance
๐ฏ Goal โ Clone the template into your own dev subscription, run the full deploy + eval loop end-to-end, feel one HITL approval โ before you face a customer.
๐ Prerequisite โ 2. Set up your machine complete.
๐ป Where you'll work โ VS Code + Azure portal (your sandbox subscription) + Foundry portal (ai.azure.com).
โ
Done when โ You ran the reference frontend at http://localhost:5173, clicked Run research, and saw a streamed briefing render with citations from the deployed /research/stream API; you read the matching App Insights trace; quality + redteam evals pass; you approved one HITL prompt.
What success looks like
Three signals, in the order you'll hit them:
1. Backend smoke test (Lab 1). Proves the Container App booted and bootstrap completed. Not a workflow validation.
2. Primary success signal โ the browser path (Lab 2). A partner engineer's first proof the accelerator works:
- Reference frontend running at
http://localhost:5173(frompatterns/sales-research-frontend/). - Click Run research with the pre-filled form.
- Streamed
statusโpartialโfinalevents render in the viewer; the result panel shows a usable briefing with citations.
3. Eval gate (Lab 4). python evals/quality/run.py --api-url <api-url> ends with something like:
quality: 18/20 passed (0.90) โฅ threshold 0.85 โ
groundedness: 19/20 passed (0.95) โฅ threshold 0.90 โ
python scripts/enforce-acceptance.py finishes with:
This step is the sandbox rehearsal โ done once per partner engineer, before your first customer-facing engagement. Returning engineers skip straight to 4. Clone for the customer on subsequent engagements.
It is not customer training. It is partner-engineer training, with check-your-work gates so you catch misunderstandings in your own sub instead of in front of a customer.
Lab objectives¶
After finishing the sandbox rehearsal you can:
- Deploy the flagship scenario to your own sandbox subscription with
azd upand confirm it works end-to-end. - Open the reference front-end locally and drive the workflow from a browser.
- Read App Insights telemetry emitted by real browser traffic, and know which dashboard panels require partner-wired emitters to light up.
- Run the quality and redteam evals against your deployment and read
scripts/enforce-acceptance.pyoutput. - Edit an agent's instructions the supported way (spec file +
azd provision), not by portal drift. - Swap the model via
accelerator.yaml โ models[]. - Scaffold a new side-effect tool via
/add-toolwith HITL baked in, and know why the redteam case is not optional. - Scaffold a new scenario with
/scaffold-from-briefand know what it actually does vs what you still author by hand.
Where you'll work in the sandbox¶
| Where | What you do there |
|---|---|
| VS Code | Run repo-local commands in the integrated terminal (Ctrl+`); edit files; talk to GitHub Copilot Chat in the right sidebar (chatmodes via /) |
| GitHub web | Watch Actions runs (optional in the lab; required in real engagements) |
| Azure portal | Resource group, App Insights logs and dashboards, Foundry quota |
| Foundry portal (ai.azure.com) | Visually confirm agents (Lab 5 demonstrates that portal edits get overwritten by spec files on next azd provision) |
Sandbox smoke-test (start here)¶
# 1. Clone the template into a sandbox repo (NOT a customer repo โ that's step 4)
gh repo create <your-handle>-accel-sandbox --template Azure-Samples/agentic-ai-solution-accelerator --private --clone
cd <your-handle>-accel-sandbox
code .
# 2. Authenticate to your SANDBOX subscription
az login --tenant <your-sandbox-tenant-id>
azd auth login
# 3. Provision + deploy
azd env new sandbox-dev
azd up
azd up returns the API URL. Hit /healthz to confirm the Container App booted and bootstrap completed โ that's the backend smoke test, not a workflow validation. Lab 2 is where you exercise /research/stream end-to-end through the reference frontend and see the accelerator actually work.
Cleanup when done: azd down --purge.
The labs (sequential)¶
The 8 labs walk the same surface with check-yourself prompts so you can self-check each result before moving on. Each one-line goal below is enough for the most common path; click Full lab if you want the verbose walkthrough.
| # | One-line goal | Check yourself | Full lab |
|---|---|---|---|
| 1 | Deploy the flagship backend to your sandbox with azd up. |
Backend smoke test only: curl <api>/healthz returns {"status":"ok"} and the resource group has AIServices + Container App + AI Search + App Insights. This proves the container booted; Lab 2 is the first user-facing validation. |
Lab 1 |
| 2 | Run the reference frontend locally and stream a research request from the browser. | Primary success signal: http://localhost:5173 renders a streamed briefing with citations after you click Run research. This is the traffic Lab 3 inspects in App Insights. |
Lab 2 |
| 3 | Read the App Insights trace for the Lab 2 call โ find the supervisor decision and worker spans. | App Insights shows a single end-to-end trace; you can name (a) which workers ran, (b) which tools fired, (c) where HITL would have been called if it were a write. | Lab 3 |
| 4 | Run quality + redteam evals against your sandbox; capture the baseline. | python scripts/enforce-acceptance.py reports green; you saved the output as your sandbox baseline. |
Lab 4 |
| 5 | Edit an agent spec in docs/agent-specs/, run azd provision, watch the change land in Foundry. |
Foundry portal shows the new instructions; portal-only edits get reverted on next provision. | Lab 5 |
| 6 | Swap the model via accelerator.yaml -> models[] and re-deploy. |
The chosen agent now runs on the new model; lint passes; eval scores haven't regressed. | Lab 6 |
| 7 | Use /add-tool to scaffold a side-effect tool โ then read the auto-generated HITL + redteam case. |
Tool calls fail-closed without HITL approval; redteam case fails the suite if you remove the HITL guard. | Lab 7 |
| 8 | Use /scaffold-from-brief to scaffold a new scenario sibling to sales_research. |
New src/scenarios/<id>/ exists, lint passes, supervisor + workers wired in WORKERS. |
Lab 8 |
โ Or open the full lab guide for all 8 labs in one page.
What's intentionally out of scope in the sandbox¶
These all become real in step 7 once you have a customer:
- GitHub Environment-scoped OIDC secrets โ sandbox
azd upruns locally withazd auth loginonly. - Multi-environment
deploy/environments.yamlโ singlesandbox-devenv is fine here. - Private endpoints, AVM, or ALZ overlay โ Tier 1 standalone for the sandbox.
- Production HITL approver webhook โ
HITL_DEV_MODE=1auto-approves in the sandbox.
Continue โ when you have a real engagement, go to 4. Clone for the customer. Otherwise stop here โ Track 1 (Get ready) is complete.