Running Distributed Eval on Azure¶
This guide covers how to deploy a distributed hive mind to Azure, feed it content, and run a Q&A evaluation against the live agents.
Architecture Overview¶
The distributed eval uses a RemoteAgentAdapter that implements the same
interface as a local LearningAgent. The eval harness (LongHorizonMemoryEval)
uses the exact same code path for local and distributed evaluation — same
question generation, same OODA loop processing, same grading, same report.
eval_distributed.py
└── LongHorizonMemoryEval.run(RemoteAgentAdapter)
│
├── adapter.learn_from_content(text)
│ └── Service Bus LEARN_CONTENT ──► all agents (broadcast)
│ → agent OODA loop: decide("store") → learn
│
├── adapter.answer_question(text)
│ ├── Service Bus INPUT {event_id} ──► one agent (round-robin)
│ │ → agent OODA loop: decide("answer") → answer
│ │ → AnswerPublisher → eval-responses topic {event_id, answer}
│ └── ◄── collect answer by event_id correlation
│
└── _grade_multi_vote(question, answer) ──► score
The agent code is identical in single-agent and distributed modes. All distribution happens via dependency injection at the entrypoint layer.
Prerequisites¶
- Azure CLI authenticated (
az login) - ANTHROPIC_API_KEY set in your environment
- Docker daemon running (for image build)
- amplihack repo cloned with
.venvactivated - azure-servicebus Python package installed (
pip install azure-servicebus)
Step 1: Deploy Agents¶
export ANTHROPIC_API_KEY="$(cat ~/.msec-k)" # or your key source
HIVE_NAME=amplihive \
HIVE_RESOURCE_GROUP=hive-mind-rg \
HIVE_LOCATION=westus2 \
HIVE_AGENT_COUNT=100 \
HIVE_AGENTS_PER_APP=5 \
HIVE_AGENT_MODEL=claude-sonnet-4-6 \
bash deploy/azure_hive/deploy.sh
This provisions:
- Resource group and Azure Container Registry (ACR)
- Azure Service Bus namespace, topic, and per-agent subscriptions
- Container Apps Environment
- N Container Apps (
ceil(HIVE_AGENT_COUNT / HIVE_AGENTS_PER_APP)apps)
To check deployment status:
Step 2: Run Eval¶
The eval script handles both content feeding and question answering in a single
command — using the same LongHorizonMemoryEval harness as single-agent eval:
SB_CONN=$(az servicebus namespace authorization-rule keys list \
-g hive-mind-rg \
--namespace-name <YOUR-SB-NAMESPACE> \
--name RootManageSharedAccessKey \
--query primaryConnectionString -o tsv)
python deploy/azure_hive/eval_distributed.py \
--connection-string "$SB_CONN" \
--input-topic hive-events-<HIVE_NAME> \
--response-topic eval-responses-<HIVE_NAME> \
--turns 5000 --questions 50 \
--agents 100 \
--grader-model claude-haiku-4-5-20251001 \
--output results.json
Replace <YOUR-SB-NAMESPACE> and <HIVE_NAME> with values from the deploy
output.
What happens during the eval¶
- Generate — creates deterministic dialogue (5000 turns) and questions (50)
- Learn — sends each turn to all agents via
LEARN_CONTENTevents - Quiz — sends each question to one agent (round-robin) via
INPUTevents; waits forEVAL_ANSWERon the response topic - Grade — uses the same hybrid grader as single-agent (
_grade_multi_vote) - Report — writes JSON report in the same format as single-agent eval
Quick validation (smaller scale)¶
python deploy/azure_hive/eval_distributed.py \
--connection-string "$SB_CONN" \
--input-topic hive-events-<HIVE_NAME> \
--response-topic eval-responses-<HIVE_NAME> \
--turns 100 --questions 10 \
--agents 100 \
--output quick-validation.json
Step 5: Cleanup¶
Remove all Azure resources when done:
Or delete the resource group directly:
Key Environment Variables¶
| Variable | Description | Default |
|---|---|---|
HIVE_NAME |
Hive deployment name | amplihive |
HIVE_RESOURCE_GROUP |
Azure resource group | hive-mind-rg |
HIVE_LOCATION |
Azure region | westus2 |
HIVE_AGENT_COUNT |
Total number of agents | 5 |
HIVE_AGENTS_PER_APP |
Agents per Container App | 5 |
HIVE_AGENT_MODEL |
LLM model for agents | claude-sonnet-4-6 |
HIVE_TRANSPORT |
Transport type | azure_service_bus |
AMPLIHACK_TOPIC_NAME |
Service Bus topic name | hive-events |
AMPLIHACK_MEMORY_CONNECTION_STRING |
Service Bus connection string | -- |
LOG_ANALYTICS_WORKSPACE_ID |
Log Analytics workspace GUID | -- |
OODA_ANSWER_WAIT |
Eval answer timeout in seconds | 600 |
Troubleshooting¶
| Symptom | Fix |
|---|---|
| Rate limits (429 errors) | Agents retry with exponential backoff (2-32s). If persistent, reduce HIVE_AGENT_COUNT or use a smaller model (e.g., Sonnet instead of Opus). |
| Service Bus auth errors | Ensure topic name matches agent config (hive-events). Verify connection string has Send and Listen claims. |
| Missing answers in eval | Check Log Analytics for ANSWER entries. Increase --answer-wait beyond 600s for large deployments. |
0 results on all queries |
Agents may be cold-started (Azure Container Apps scale to zero). Send a warmup query first or check az containerapp revision list. |
| Subscription not found | Verify the eval subscription exists: az servicebus topic subscription list --resource-group hive-mind-rg --namespace-name <ns> --topic-name hive-events |
| Deploy script fails | Ensure Docker daemon is running and az login is current. Check deploy.sh --status for partial deployments. |
Deploy Script Options¶
bash deploy/azure_hive/deploy.sh # Deploy everything
bash deploy/azure_hive/deploy.sh --build-only # Build + push image only
bash deploy/azure_hive/deploy.sh --infra-only # Provision infra only
bash deploy/azure_hive/deploy.sh --cleanup # Tear down resource group
bash deploy/azure_hive/deploy.sh --status # Show deployment status
Related Documentation¶
- Hive Mind Eval Strategy -- Topology comparison and scoring methodology
- Running Evals Quick Start -- All eval types in one page
- Long-Horizon Memory Eval -- Single-agent eval details