Hive Mind — Evaluation¶
This page describes the in-process evaluation harness that ships in
amplihack-hive today. It is the contract that any future cloud-backed
runner must preserve.
Rust port note Upstream
amplihackruns the long-horizon eval asamplihack eval long-horizon-memoryagainst an Azure-deployed hive (Container Apps + Service Bus / Event Hubs). The Rust port atamplihack-hive::hive_evalkeeps the same event-loop contract but executes against an in-process [LocalEventBus]. Cloud transport and packaged eval reports are upstream responsibilities until the Rust port wires those backends — see Getting Started → Status.
What Lives In This Repo¶
- The native eval loop:
amplihack_hive::run_evalandamplihack_hive::run_eval_with_responder - The companion learning feed:
amplihack_hive::feed::run_feed - CLI argument structs and runner functions in
crates/amplihack-cli/src/commands/hive_haymaker.rs(not yet routed asamplihacksubcommands)
What Does Not Live In This Repo¶
- Long-horizon dataset generation
- Azure Container Apps / Service Bus / Event Hubs deployment assets
- The packaged
run_distributed_eval.shwrapper and report bundle layout - Standard / holdout question slicing
Those continue to live in upstream rysweet/amplihack (Python). The Rust
port intentionally focuses on the runtime side.
Eval Loop Contract¶
use amplihack_hive::{HiveEvalConfig, LocalEventBus, run_eval_with_responder};
let cfg = HiveEvalConfig::new(vec![
"What port does PostgreSQL use?".into(),
"What is the default Nginx port?".into(),
])
.with_timeout(10) // seconds to wait for responses per query
.with_min_responses(1); // minimum responses before moving on
let mut bus = LocalEventBus::new();
let result = run_eval_with_responder(&mut bus, &cfg, |question| {
// Replace this stub with real agents / SDK calls.
vec![("agent-1".into(), format!("echo: {question}"), 0.5)]
})?;
assert_eq!(result.total_queries, cfg.questions.len());
Topics involved (from amplihack_hive::hive_events):
| Topic | Direction | Payload |
|---|---|---|
HIVE_QUERY |
harness → agents | { query_id, question } |
HIVE_QUERY_RESPONSE |
agents → harness | { query_id, agent_id, answer, confidence } |
HIVE_LEARN_CONTENT |
feeder → agents | { feed_id, item } |
HIVE_FEED_COMPLETE |
feeder → agents | { feed_id, items_published } |
HIVE_AGENT_READY |
agents → harness | { agent_id } |
run_eval(bus, &cfg) is the no-responder form: it publishes queries and
collects whatever subscribers are already on the bus. Use it when wiring
real agents in advance; use run_eval_with_responder for self-contained
tests and demos.
Output¶
HiveEvalResult contains:
query_results: Vec<QueryResult>— one entry per question with all collectedAgentAnswer { agent_id, answer, confidence }total_queries: usizetotal_responses: usizeaverage_confidence: f64
Render to text with the helper in hive_haymaker.rs:
use amplihack_cli::commands::hive_haymaker::format_eval_text;
println!("{}", format_eval_text(&result));
JSON output is via serde_json::to_string_pretty(&result)?.
Library API: Feed + Eval In One Process¶
use amplihack_hive::{
FeedConfig, HiveEvalConfig, LocalEventBus, feed::run_feed,
run_eval_with_responder,
};
let mut bus = LocalEventBus::new();
// 1. Seed content into the hive.
let feed = run_feed(
&mut bus,
&FeedConfig::new("smoke", vec!["Photosynthesis happens in chloroplasts.".into()]),
)?;
assert_eq!(feed.items_published, 1);
// 2. Run questions against agents. (Stub responder for illustration.)
let cfg = HiveEvalConfig::new(vec!["Where does photosynthesis happen?".into()]);
let result = run_eval_with_responder(&mut bus, &cfg, |q| {
vec![("a1".into(), format!("answer to {q}"), 0.8)]
})?;
assert_eq!(result.total_queries, 1);
CLI Surface (Planned — Not Yet Wired)¶
hive_haymaker.rs defines HiveFeedArgs and HiveEvalArgs. Today these
are exercised via unit tests and direct library calls, not via
amplihack hive feed or amplihack hive eval — those subcommands are not
registered in cli_subcommands.rs yet. A tracking issue should land before
the docs claim a CLI invocation form.
The argument shape that will be exposed once wired:
| Struct | Field | Default | Notes |
|---|---|---|---|
HiveFeedArgs |
deployment_id |
(required) | Logical hive identifier |
turns |
100 |
Number of HIVE_LEARN_CONTENT events |
|
topic |
env / default | Override Service Bus topic name | |
sb_conn_str |
"" |
Stub — logs a warning and uses LocalEventBus |
|
HiveEvalArgs |
deployment_id |
(required) | Logical hive identifier |
repeats |
3 |
Question rounds | |
wait_for_ready |
0 |
Stubbed — never blocks | |
timeout |
600 |
Per-round timeout (seconds) | |
topic |
env / default | Override Service Bus topic name | |
sb_conn_str |
"" |
Stub — logs a warning and uses LocalEventBus |
|
output |
Text |
Text or Json |
There are no --mode, --question-set, --agents, --seeds,
--memory-type, --sdk, --answer-mode, --parallel-workers, --load-db,
or --skip-learning flags in this port today. Those concepts live upstream.
Cloud Transport (Planned — Stubbed Today)¶
Both runners check sb_conn_str; if non-empty the runner logs:
…and continues against a fresh LocalEventBus. There is no silent
fall-through to noop and no Event Hubs path. The cloud transport story will
land alongside the wired CLI subcommand.