Hands-on lab — first deployment walkthrough¶
Walkthrough version: Get ready → 3. Rehearse in a sandbox frames this lab as the third step of the partner onboarding flow. This page is the full lab content with check-your-work gates.
Sandbox rehearsal for a partner engineer new to the agentic AI
solution accelerator. Recommended before your first customer-facing engagement; not required per customer (returning engineers skip straight to QUICKSTART.md). After finishing, you'll be comfortable enough
with the template to run a real customer engagement against it.
This is not training for the customer's end users — that's partner-owned,
separately. And it's not a substitute for reading docs/getting-started/setup-and-prereqs.md
(the authoritative prereqs + troubleshooting) or QUICKSTART.md (the
eight-step partner motion). The lab walks you through the same surface area
with check-your-work gates so you catch misunderstandings in a sandbox,
not in front of a customer.
Objectives¶
After the lab you can:
- Deploy the flagship scenario to your own sandbox subscription with
azd upand confirm it works end-to-end. - Open the reference front-end locally and drive the workflow from a browser.
- Read App Insights telemetry emitted by real browser traffic, and know which dashboard panels require partner-wired emitters to light up.
- Run the quality and redteam evals against your deployment and read
scripts/enforce-acceptance.pyoutput. - Edit an agent's instructions the supported way (spec file +
azd provision), not by portal drift. - Swap the model via
accelerator.yaml → models[]. - Scaffold a new side-effect tool via
/add-toolwith HITL baked in, and know why the redteam case is not optional. - Scaffold a new scenario with
/scaffold-from-briefand know what it actually does vs what you still author by hand.
Prerequisites¶
- An Azure sandbox subscription where you have Contributor — do
not use a customer subscription for the lab. Cleanup at the
end is
azd down --purge. - Regional Foundry quota for
gpt-5-minionGlobalStandard(the shipped default is 30k TPM — seeinfra/main.parameters.json). Confirm in the Azure portal → Foundry → Quotas before starting. - The tools listed in the "Prerequisites" section of
docs/getting-started/setup-and-prereqs.md(Azure CLI,azd,gh,git, PowerShell 7 on Windows, Python 3.11+). Docker/Podman is optional —azd upbuilds the container image remotely in Azure Container Registry by default. - A GitHub org/account where you can push a private template clone.
- VS Code with GitHub Copilot Chat enabled (required for the
chatmodes under
.github/chatmodes/).
If any prereq is missing, fix it before continuing — this lab does
not work around a broken local environment. The troubleshooting
matrix in docs/getting-started/setup-and-prereqs.md ("Troubleshooting — top 5") is
the first stop when something goes wrong.
Lab-only scope. Other sections of
docs/getting-started/setup-and-prereqs.md— GitHub Environment-scoped secrets (AZURE_CLIENT_ID/TENANT_ID/SUBSCRIPTION_ID/AZURE_LOCATION), repo-levelEVALS_API_URL,HITL_APPROVER_ENDPOINT, multi-environmentdeploy/environments.yaml, and private-network (enablePrivateLink) setup — apply to the production / customer motion inQUICKSTART.md, not this lab. The lab runsazd uplocally against a sandbox subscription (azd auth logincovers auth), runs evals locally against the deployed API URL, and uses a singlelab-devenvironment on Tier 1 standalone. You'll meet those sections during your first real customer deploy.
Where you'll work¶
You'll move between four places as you go through the lab. Every lab below opens with a Where line so you know which one to be in.
| Where | What you do there | How to open it |
|---|---|---|
| VS Code | Run all repo-local commands in the integrated terminal (Ctrl+`), edit files (accelerator.yaml, agent specs, evals, prompts), and talk to GitHub Copilot Chat in the right sidebar (💬 icon or Ctrl+Alt+I; type / to see chatmodes like /discover-scenario and /add-tool) |
After cloning, code . from any shell opens it on the repo |
| GitHub web (github.com) | Watch Actions runs (optional in the lab; required for the real partner motion) | Your browser, on the cloned repo |
| Azure portal (portal.azure.com) | Inspect the resource group, App Insights logs and dashboards, Foundry quota | Your browser, signed into the same tenant azd deployed to |
| Foundry portal (ai.azure.com) | Visually confirm agents (Lab 5 demonstrates that portal edits get overwritten by spec files) | Your browser → https://ai.azure.com → sign in with the same tenant → select the project named in azd env get-values (look for AZURE_AI_FOUNDRY_PROJECT_NAME) → Agents in the left nav |
Lab 2 also has you open a local browser tab at http://localhost:5173 for the running dev frontend — Vite opens it after npm run dev.
Lab 1 — First deploy¶
Where: VS Code (integrated terminal for the gh / az / azd commands; editor to confirm the repo loaded with .github/copilot-instructions.md); Azure portal (portal.azure.com → your resource group) for the Check your work verification.
Goal: deploy the unmodified template to a sandbox and confirm the backend bootstrapped successfully. This is a backend smoke test — Lab 2 is where you exercise the workflow through the reference frontend.
- Clone the template into your own private repo:
# Replace <your-handle> with any short name (e.g., contoso-lab-accel becomes from <your-handle>=contoso)
gh repo create <your-handle>-lab-accel --template Azure-Samples/agentic-ai-solution-accelerator --private --clone
cd <your-handle>-lab-accel
Then load the folder into your current VS Code window via File → Open Folder (Ctrl+K Ctrl+O on Windows/Linux, Cmd+K Cmd+O on macOS) and pick <your-handle>-lab-accel. If you're running these commands from a standalone shell instead, code <your-handle>-lab-accel opens it in a fresh window.
-
In VS Code, confirm Copilot Chat loads the repo's
.github/copilot-instructions.md(you'll see it referenced in the chat sidebar). If it doesn't, Copilot is not going to enforce the partner guardrails — stop and fix before continuing. -
Authenticate + provision:
About preflight: the partner motion in QUICKSTART.md Step 4 has you run
/configure-landing-zone and /deploy-to-env before azd up. The lab skips
both: it deploys Tier 1 (standalone) into a sandbox where evals run locally,
so a GitHub Environment isn't required yet. You'll meet both chatmodes during
your first real customer deploy. Cross-reference QUICKSTART.md Step 4 for
the production motion.
az login
# If your account spans multiple tenants/subscriptions, pin them explicitly
# so azd inherits the right context:
# az login --tenant <sandbox-tenant-id>
# az account set --subscription <sandbox-subscription-id>
azd auth login
azd env new lab-dev
azd up
azd up provisions Foundry, AI Search, Key Vault, Container
Apps, App Insights, and the user-assigned MI. The Container App
then runs its in-app bootstrap (src/bootstrap.py) at FastAPI
startup to create/verify Foundry agents and seed the AI Search
accounts index before /healthz returns 200. Expect ~10–15
minutes total.
azd up will prompt you for an Azure region — pick one with gpt-5-mini GlobalStandard quota (verified in step 2 of Prerequisites).
If
/healthzreturns 503 / startup probe fails immediately after Bicep finishes, that's typically RBAC propagation lag — the role assignments Bicep just created haven't fully propagated. The startup probe budget is 10 minutes (60 × 10s); in normal conditions this absorbs the lag without intervention. If the probe still fails after the budget, see Troubleshooting #5 indocs/getting-started/setup-and-prereqs.md.
Check your work:
This lab is a backend smoke test, not a workflow validation. Lab 2 is the first user-facing success signal.
- The final line of
azd upprints an API URL. Hit/healthz— expect 200 with{"status": "ok", "bootstrap": "complete"}. This only proves the Container App booted and bootstrap completed; it does not prove/research/streamproduces a usable briefing. That's Lab 2. - In the Azure portal, open the resource group and confirm you have
a Foundry AIServices account, a model deployment
(
gpt-5-miniby default) bound to theaccelerator-default-policycontent filter, an AI Search service with anaccountsindex, Key Vault, Container App, App Insights, and a user-assigned MI. - If anything is missing,
docs/getting-started/setup-and-prereqs.md"Troubleshooting — top 5" covers the common failure modes.
Lab 2 — See it work in a browser¶
Where: VS Code (integrated terminal for npm install / npm run dev; editor for .env), then your browser at http://localhost:5173 for the running frontend.
Goal: exercise the API the way a customer will — through a browser — and confirm the streaming pipeline produces a usable briefing end-to-end.
The accelerator ships a reference frontend at
patterns/sales-research-frontend/ — a minimal React + Vite + TypeScript
starter that consumes POST /research/stream directly. It is intentionally
plain: no auth, no state persistence, no UI framework. The customer's real
UX is the partner's value-add; this lab just proves the wiring.
Steps:
- Grab the deployed API URL from
azd up's final output (orazd env get-values | Select-String AZURE_CONTAINER_APP_URL). You want the base URL — the pattern appends/research/streamitself. - From the repo root:
cd patterns/sales-research-frontend
cp .env.example .env
# edit .env: VITE_API_BASE_URL=<deployed-api-url>
npm install
npm run dev
- Open
http://localhost:5173. The form is pre-filled with sensible defaults — click Run research and watch the streaming viewer light up withstatus,partial, andfinalevents as the supervisor DAG executes. The result panel renders the aggregated briefing; toggle Show raw JSON to inspect the structured output.
Check your work:
Primary success signal — this is the first time you're seeing the accelerator actually work end-to-end:
- The form at
http://localhost:5173loads with default values. - Clicking Run research streams
status→partial→finalevents into the live viewer (no errors in the browser console, no CORS rejection). - The result panel renders a usable research briefing with citations. Toggle Show raw JSON to confirm the structured output matches the briefing.
If the stream stalls, errors, or returns an empty briefing, the workflow
has a problem that Lab 1's /healthz smoke test could not detect — most
common causes are model quota exhaustion, AI Search index seeding failure,
or a regression in src/scenarios/sales_research/workflow.py. Capture the
App Insights trace (Lab 3 walks this) before debugging.
Deeper check (for partners customising the briefing shape):
- Every event in the live stream maps to one yielded dict from
SalesResearchWorkflow.stream(seesrc/scenarios/sales_research/workflow.py) or from the underlyingSupervisorDAG(seesrc/workflow/supervisor.py). If a new event type appears in the stream that the UI doesn't recognise, add it to theStreamEventunion insrc/types/research.tsand to thedescribe()switch inStreamingViewer.tsx. - The final panel renders fields from the supervisor's
transform_response(src/scenarios/sales_research/agents/supervisor/transform.py). If you customise the briefing shape for the customer, updateResearchBriefingandResultPanel.tsxtogether.
Going further: see patterns/sales-research-frontend/README.md for the
SWA deploy flow (swa deploy ./dist) and the customisation map. For a real
customer engagement, plan auth (Entra ID via easy-auth on Container Apps or
App Gateway), state persistence, and an actual HITL approval surface before
the UI is customer-facing.
Lab 3 — Read the telemetry¶
Where: Azure portal — sign in at https://portal.azure.com, navigate to your resource group (named rg-<azd-env-name>), then open the Application Insights resource inside it. Logs is in the left nav under "Monitoring"; Workbooks is in the same group.
Goal: correlate real browser traffic to App Insights events and understand which dashboard panels require partner-wired emitters.
After clicking Run research in Lab 2, you have real traffic. Open App Insights and trace it.
- In App Insights → Logs, run:
traces
| where timestamp > ago(15m)
| where message in ("request.received","supervisor.routed","worker.completed",
"retrieval.returned","response.returned","tool.executed",
"tool.hitl_approved","tool.hitl_rejected","aggregator.composed")
| where isnotempty(customDimensions.event_name)
| project timestamp, message, operation_Id, operation_ParentId, customDimensions
| order by timestamp asc
You should see request.received → supervisor.routed →
one or more worker.completed → retrieval.returned →
response.returned. If the request actually routed through a
HITL-gated side-effect tool (crm_write_contact, send_email)
you'll also see tool.executed + a tool.hitl_* event — but
many flagship requests don't touch those tools, so don't treat
those two events as guaranteed per request. Which tool.hitl_*
variant fires depends on whether you set HITL_APPROVER_ENDPOINT
(prod) or HITL_DEV_MODE=1 (dev-only).
Why
tracesand notcustomEvents? Events are emitted bysrc/accelerator_baseline/telemetry.py::emit_event, which routes through theacceleratorPython logger thatconfigure_azure_monitor(logger_name="accelerator")pipes into App Insights. Log records land intraces(one row per event) withmessage == event.nameand the event payload flattened intocustomDimensions.<attr>. Theisnotempty(customDimensions.event_name)filter pins the result set to accelerator events specifically — it also screens out duplicate rows from older deployments that emitted both a log record and an OTel span event for each call.
The operation_Id column lets you correlate one stream call to its
parent requests row and any nested dependencies. Pivot from a
single event:
let opId = "<paste-an-operation_Id-from-above>";
union requests, dependencies, traces
| where operation_Id == opId
| project timestamp, itemType, name=coalesce(name, message), duration, success, customDimensions
| order by timestamp asc
If you want to drive synthetic traffic without the UI, here's the curl form:
curl -N -X POST "$API_URL/research/stream" \
-H "Content-Type: application/json" \
-d '{"company_name":"Contoso","seller_intent":"Discovery call","persona":"VP of Operations"}'
- Open
infra/dashboards/roi-kpis.jsonin the repo, copy the JSON, then in Application Insights → Workbooks → New → Advanced editor paste it and save. Refresh — the "Successful responses per day" and "P95 request latency" panels should show data from your traffic. The "HITL approval rate" panel only lights up once you've exercised a HITL-gated tool (see the "HITL approver" section ofdocs/customer-runbook.md).
Check your work:
- Answer for yourself: why are both the "$ per call" and
"Groundedness eval score trend" panels empty in the shipped flagship?
(The workbook keys those panels off
cost.callandeval.resultcustom events; neither is emitted by the shipped workflow —cost.py::record_call_cost()exists but isn't wired into the hot path. Seedocs/customer-runbook.md"What you inherited" and Section 3 (Operational dials) for the full answer.)
Lab 4 — Run evals + acceptance (baseline)¶
Where: VS Code's integrated terminal (repo root). All three commands run locally against the deployed API URL.
Goal: understand the two-step eval flow.
This is your baseline — every mutation lab from here on ends by re-running this same chain.
- From the repo root:
# Replace <your-api-url> with the deployed endpoint (e.g., https://my-app.azurecontainerapps.io)
python evals/quality/run.py --api-url <your-api-url>
python evals/redteam/run.py --api-url <your-api-url>
python scripts/enforce-acceptance.py
- Read the output of
enforce-acceptance.py. It reports which thresholds fromaccelerator.yaml.acceptancepassed or failed. - Lower the
quality_thresholdinaccelerator.yamlby 0.2 and re-runenforce-acceptance.py. Notice: the quality gate now passes trivially. Revert — do not commit a loosened gate.
Check your work:
- Look at the
cost_per_call_usdline in theenforce-acceptance.pyoutput. In the flagship scenario,src/accelerator_baseline/cost.py::record_call_cost()is not called from the workflow — butevals/quality/run.pystill records a best-effortcost_usdper case (token-pricing if the workflow surfaces usage, latency-based fallback otherwise; seeevals/quality/run.py:107-134). That means the gate produces a number from day one; it's just not a true workflow cost until a partner wiresrecord_call_costinto the hot path. src/accelerator_baseline/evals.py:66-75is the branch that does fail hard — only if the runner is modified to omitcost_usdentirely. That's the "inert-is-a-failure" safety net.docs/customer-runbook.mdSection 3 and Section 4 describe what a partner has to wire for the cost gate to reflect real model spend, not a latency proxy.
Lab 5 — Edit an agent's instructions the supported way¶
Where: VS Code for the spec-file edit and azd deploy (integrated terminal), then the Foundry portal for the "your manual edit got overwritten" demo. To open the Foundry portal: https://ai.azure.com → sign in with the same tenant azd deployed to → select the project named in azd env get-values (AZURE_AI_FOUNDRY_PROJECT_NAME) → Agents in the left nav → click the agent → Instructions tab.
Goal: understand that the Foundry portal is not the source of truth.
- Open
docs/agent-specs/accel-sales-research-supervisor.md(or whichever agent you want to tweak). This is the repo-side source of truth for the agent's instructions. - Make a small edit — change a guideline, add a sentence, whatever. Save.
- Run
azd deploy. The image rebuilds, the Container App rolls a new revision, and on startupsrc/bootstrap.pysyncs the new Instructions into Foundry. - Now open the Foundry portal, find the same agent, and manually edit the instructions there. Save.
- Run
azd deployagain (or restart the Container App revision — Container Apps → Revisions → "Restart").
Check your work:
- Re-open the agent in the portal. Your manual edit is gone —
overwritten by the spec file. This is the designed behavior:
portal edits are transient. The supported rollback path for a
bad prompt is
git revertthe spec +azd deploy, not "restore from Foundry portal history". docs/customer-runbook.mdSection 6 (Model swap) is the condensed version of this behavior for the customer's ops team.
Now re-run acceptance:
python evals/quality/run.py --api-url <your-api-url>
python evals/redteam/run.py --api-url <your-api-url>
python scripts/enforce-acceptance.py
Compare to your Lab 4 baseline. A prompt edit can move quality scores in either
direction; if a threshold drops below accelerator.yaml.acceptance, revert and
try again. The acceptance gate is the contract.
Lab 6 — Swap the model¶
Where: VS Code — editor to edit accelerator.yaml, integrated terminal for azd up and the eval chain.
Goal: do a model swap the supported way.
- Open
accelerator.yamland replace thedefault: trueentry undermodels:with a different model your sandbox has quota for (e.g.gpt-4.1-miniinstead ofgpt-5-mini, with a validversionand acapacitywithin your quota). - Run
azd up. Bicep parses the new manifest at compile time (loadYamlContent), Foundry re-deploys the model in place, and on Container App restartsrc/bootstrap.pyre-resolves the slug → deployment_name map. - Re-run the eval chain from Lab 4:
python evals/quality/run.py --api-url <your-api-url>
python evals/redteam/run.py --api-url <your-api-url>
python scripts/enforce-acceptance.py
Quality may shift — that's the point. If a threshold drops, the model isn't a drop-in replacement.
Check your work:
- Try
azd env set AZURE_AI_FOUNDRY_MODEL_NAME=some-other-modeland runazd up. Notice: Bicep ignores the env var entirely because the model now comes fromaccelerator.yaml -> models[]vialoadYamlContentat compile time. Raw env-var overrides are unsupported; the manifest is the source of truth.
Lab 7 — Add a side-effect tool with /add-tool¶
Where: VS Code — Copilot Chat sidebar for the chatmode, editor for any post-generation edits and the redteam case authoring, integrated terminal for accelerator-lint.py / pytest / the eval chain.
Goal: experience the scaffolded-with-HITL contract.
- In Copilot Chat, invoke
/add-tool. The chatmode (see.github/chatmodes/add-tool.chatmode.md) asks for seven inputs: tool name, external system, operation, reversibility, HITL policy, which worker uses it, and auth approach. - Pick something plausible — e.g. create a ticket in a ticketing
system, irreversible,
HITL_POLICY = "always", attached to an existing worker agent, Managed Identity auth. - Copilot generates
src/tools/<tool_name>.pywith HITL scaffolding and nudges you to register it on the appropriate worker, add a unit test, and add a redteam case underevals/redteam/. Confirm the worker registration actually landed — the chatmode instructs it but partners have to verify. - Run
python scripts/accelerator-lint.py— it must report0 blocking, 0 warning findings. - Run
pytest -q— the new test must pass.
Author + run the redteam case¶
The /add-tool chatmode tells you to add a redteam case for the new tool — that's
the contract. Before calling the lab done, do both halves:
- Add a case to
evals/redteam/cases.jsonlexercising prompt-injection or jailbreak attempts to misuse the tool. - Run it:
Confirm your new case appears in the output and the safety bar in
accelerator.yaml.acceptance.safety_pass still holds.
Now re-run acceptance¶
python evals/quality/run.py --api-url <your-api-url>
python evals/redteam/run.py --api-url <your-api-url>
python scripts/enforce-acceptance.py
The redteam re-run picks up your new case; quality + acceptance ensure the tool didn't regress the scenario.
Check your work:
- Remove the
checkpoint(...)call from the tool and re-run lint. On anysrc/tools/*.pyfile that declaresHITL_POLICY, thehitl-requiredrule fails loudly. Putcheckpoint(...)back before continuing. - Confirm the redteam case you authored above appears in
evals/redteam/run.py's output. Thesafety_passthreshold inaccelerator.yaml.acceptancewill block merge if any redteam case fails — including the one you just added.
Lab 8 — Scaffold a new scenario¶
Where: VS Code — Copilot Chat sidebar for /discover-scenario, integrated terminal for scaffold-scenario.py and accelerator-lint.py, editor for pasting the printed YAML block and inspecting/editing the generated stubs.
Goal: understand what /scaffold-from-brief actually generates
vs what you still author.
- In Copilot Chat, run
/discover-scenarioagainst a realistic sandbox scenario you make up (e.g. "summarize support tickets weekly"). Answer the questions. The chatmode writesdocs/discovery/solution-brief.mdand updatesaccelerator.yamlsolution.*,acceptance.*, andkpis[]from your answers — it does not touch thescenario:block (that comes next). - Run
python scripts/scaffold-scenario.py ticket-summary --display "Ticket Summary". Inspect what it generated undersrc/scenarios/ticket_summary/:schema.py,workflow.py,retrieval.py, and a single supervisor agent package (agents/supervisor/{prompt,transform,validate}.py) plus one supervisor spec stub atdocs/agent-specs/accel-ticket-summary-supervisor.md. The script also prints ascenario:YAML block to your terminal for you to paste intoaccelerator.yaml. - Paste the printed
scenario:block over the existingscenario:block inaccelerator.yaml. - Run
python scripts/accelerator-lint.py. In a fresh scaffold, most lint rules will pass because the generated files are syntactically valid and the supervisor spec ships with a generic baseline. But theprompt.py,transform.py,validate.py, andretrieval.pyare minimal placeholders — read them, then build them out: tighten the supervisor spec for your domain, add worker agents withscripts/scaffold-agent.py, author retrieval schema, and author golden + redteam cases before deploying to a customer.
Check your work:
- Open the generated prompt / transform / validate stubs under
src/scenarios/ticket_summary/agents/supervisor/. They're deliberate placeholders. The supervisor spec ships with generic baseline instructions that run as-is but aren't domain-aware — tighten those instructions for your scenario. Don't ship to a customer until real behavior is authored, the supervisor spec reflects your domain, golden + redteam cases exist, and lint reports0 blocking, 0 warning findings. /scaffold-from-briefdrives all of the above in one chatmode invocation; the CLIscripts/scaffold-scenario.pyis the underlying mechanicscaffold-from-briefcalls into. Knowing the difference matters when something goes wrong mid-chatmode and you have to finish by hand.
Cleanup¶
This tears down every resource in the lab environment. Do this before closing the lab — a lingering Foundry deployment consumes quota you might need for the next run.
Where to go next¶
- Deploy the UI from Lab 2 to Azure Static Web Apps with
swa deploy ./dist(patterns/sales-research-frontend/README.mdhas the full flow includingVITE_API_BASE_URLfor build-time API binding). - Run the actual partner motion (
QUICKSTART.md+docs/partner-playbook.md) against a scoped sandbox engagement — not a real customer — before taking the accelerator to a paying engagement. - Read
docs/patterns/azure-ai-landing-zone/README.mdand run/configure-landing-zoneagainst a Tier 2avmoverlay in your sandbox. Tier 3alz-integratedrequires a hub to peer to; if you don't have one, stop at Tier 2. - Discovery artifacts live under
docs/discovery/— five engagement artifacts (use-case-canvas.md,SOLUTION-BRIEF-GUIDE.md,discovery-workbook.csv,solution-brief.md,roi-calculator.xlsx) plushow-to-use.mdas the sequencing meta-guide. Run/discover-scenariowhen you're ready to turn workshop notes into a filled brief. - Contribution flow for external partners is documented at
.github/CLA.md— short version: on your first PR, a CLA bot comments with a link to https://cla.opensource.microsoft.com; sign once there and thelicense/clastatus check turns green. Rely on the status check and the portal for the specific repo you're contributing to rather than assuming broad coverage. Partner private forks (outside themicrosoft/Azureorgs) are outside Microsoft's CLA flow entirely; the bot only fires on upstream PRs.
What this lab is not¶
- Customer training — partner-owned, separately
- A substitute for reading
docs/getting-started/setup-and-prereqs.mdandQUICKSTART.md - Certification — there's no badge or quiz; the check-your-work gates exist so you notice your own gaps, not to score you
- A Tier 2 / Tier 3 networking lab — that's
/configure-landing-zoneterritory and requires infrastructure beyond a dev sandbox
← Back to the partner walkthrough
This page is the full lab guide. The walkthrough version (with hybrid one-line summaries + check-yourself prompts per lab) lives at 3. Rehearse in a sandbox.