Distributed Hive Mind — Design Document¶
Problem Statement¶
Goal-seeking agents generated by the Goal Agent Generator operate with isolated memory. Each agent has its own Kuzu graph DB scoped by agent_id. When multiple agents work on related tasks, they cannot share discoveries, leading to duplicated effort and missed cross-domain insights.
Solution: Layered Hive Mind Architecture¶
The Unified Hive Mind composes four independent mechanisms into a layered architecture where each layer solves a distinct problem:
┌─────────────────────────────────────────────────────┐
│ Layer 4: QUERY │
│ Content-hash deduplication across all sources │
│ Keyword + topic retrieval, merged result sets │
├─────────────────────────────────────────────────────┤
│ Layer 3: DISCOVERY (Gossip Protocol) │
│ Periodic top-K fact sharing for "unknown unknowns" │
│ Lamport clocks, configurable fanout │
├─────────────────────────────────────────────────────┤
│ Layer 2: TRANSPORT (Event Bus) │
│ FACT_PROMOTED events propagated to peers │
│ Append-only event log for audit trail │
├─────────────────────────────────────────────────────┤
│ Layer 1: STORAGE (Hierarchical Graph) │
│ Local subgraph (private) + Hive subgraph (shared) │
│ Promotion with configurable consensus policy │
└─────────────────────────────────────────────────────┘
Why Four Layers?¶
Each layer addresses a distinct concern that the others cannot:
| Layer | Concern | Without It |
|---|---|---|
| Storage | Where do shared facts live? | No persistence or access control |
| Transport | How do promotions propagate? | Agents must poll for changes |
| Discovery | How do agents find facts they didn't know to look for? | Only query-based retrieval |
| Query | How are results from all layers merged? | Duplicate facts in results |
API¶
Quick Start¶
from amplihack.agents.goal_seeking.hive_mind.unified import (
UnifiedHiveMind,
HiveMindAgent,
HiveMindConfig,
)
# Create hive with default config
hive = UnifiedHiveMind()
# Register agents
hive.register_agent("agent_a")
hive.register_agent("agent_b")
# Convenience wrappers
alice = HiveMindAgent("agent_a", hive)
bob = HiveMindAgent("agent_b", hive)
# Alice learns a fact (stored locally)
alice.learn("PostgreSQL runs on port 5432", confidence=0.95, tags=["infra"])
# Alice promotes her best fact to the hive
alice.promote("PostgreSQL runs on port 5432", confidence=0.95, tags=["infra"])
# Bob can now find Alice's promoted fact
results = bob.ask("What port does PostgreSQL use?")
# → [{"content": "PostgreSQL runs on port 5432", ...}]
# Gossip spreads unpromoted facts too
hive.run_gossip_round()
hive.process_events()
Configuration¶
config = HiveMindConfig(
promotion_confidence_threshold=0.7, # Min confidence to promote
promotion_consensus_required=2, # Agents must agree
gossip_interval_rounds=5, # Auto-gossip every N rounds
gossip_top_k=10, # Facts per gossip message
gossip_fanout=2, # Peers per gossip round
event_relevance_threshold=0.3, # Min relevance to incorporate
enable_gossip=True, # Toggle gossip layer
enable_events=True, # Toggle event layer
)
hive = UnifiedHiveMind(config)
Experiment Results¶
Five experiments were conducted, each testing a different approach:
| # | Approach | Tests | Overall Score |
|---|---|---|---|
| 1 | Shared Blackboard | 31 | 47% |
| 2 | Event-Sourced | 47 | 49% |
| 3 | Gossip Protocol | 42 | 54% |
| 4 | Hierarchical Graph | 54 | 57% |
| 5 | Unified (combined) | 40 | 94% |
The unified approach outperforms the best individual experiment by +37 percentage points, confirming that the mechanisms are complementary.
Detailed Metrics¶
Experiment 1 (Blackboard): +75pp cross-agent recall. Simple but no access control — all facts immediately shared. Good for small agent groups.
Experiment 2 (Event-Sourced): +65pp cross-domain quality. Complete audit trail via append-only event log. 0.013ms publish latency. Late joiner replay in 0.02ms.
Experiment 3 (Gossip): >95% knowledge convergence in 7 rounds for 5 agents. Weighted random sampling ensures all facts eventually propagate. Scales sub-linearly for small networks.
Experiment 4 (Hierarchical): +4.2pp with zero local regression. Most conservative — only high-confidence facts with consensus are promoted. Best autonomy preservation.
Experiment 5 (Unified): 100% local, 100% cross-domain, 81% combined = 94% overall. Composes all four layers for best-of-all-worlds.
Module Reference¶
hive_mind/unified.py — Unified Hive Mind¶
HiveMindConfig— Configuration dataclassUnifiedHiveMind— Main orchestrator composing all layersHiveMindAgent— Per-agent convenience wrapper
hive_mind/blackboard.py — Shared Blackboard¶
SharedFact— Fact dataclass with content hashHiveMemoryStore— Shared fact CRUD with dedupHiveMemoryBridge— Local ↔ shared bridgeHiveRetrieval— MemoryAgent-compatible strategyMultiAgentHive— Agent registry + coordinator
hive_mind/event_sourced.py — Event Sourcing¶
HiveEvent— Immutable event dataclassHiveEventBus— Thread-safe pub/subEventLog— Append-only log with persistenceEventSourcedMemory— Memory + event publishingHiveOrchestrator— Event bus coordinator
hive_mind/gossip.py — Gossip Protocol¶
GossipFact/GossipMessage— Gossip data typesGossipProtocol— Per-agent gossip logicGossipNetwork— Network coordinatorGossipMemoryAdapter— Memory store bridge
hive_mind/hierarchical.py — Hierarchical Graph¶
HiveFact/LocalFact— Two-level fact typesPromotionPolicy— Configurable promotion rulesPromotionManager— Propose/vote/promote lifecyclePullManager— Hive query + pull to localHierarchicalKnowledgeGraph— Two-level orchestrator
Current State¶
All five original "future work" items have been implemented:
- Real Kuzu Integration — Done. Each agent owns a Kuzu DB via KuzuGraphStore.
- LearningAgent Bridge — Done. FederatedGraphStore composes local + hive.
- Full Eval Harness — Done. 1000-turn eval with 12 agents across 5 hive federation.
- Distributed Mode — Done. EventBus with Local/Redis/Azure Service Bus backends.
- HiveGraph Protocol — Done. Swappable backends (InMemory, PeerHive with Raft).
CognitiveAdapter Hive Integration¶
The bridge between LearningAgent and the hive mind is in CognitiveAdapter (src/amplihack/agents/goal_seeking/cognitive_adapter.py).
How Facts Flow¶
LearningAgent.learn_from_content(content)
→ LLM extracts structured facts
→ CognitiveAdapter.store_fact(context, fact, confidence)
→ Stores in local Kuzu DB (CognitiveMemory.store_fact)
→ _promote_to_hive() — auto-promotes to shared hive if connected
→ hive.promote_fact(agent_name, HiveFact(...))
LearningAgent.answer_question(question)
→ CognitiveAdapter.search(query) or get_all_facts()
→ Queries local Kuzu DB
→ _search_hive(query) — queries shared hive
→ _merge_results() — deduplicates, local facts prioritized
→ LLM synthesizes answer from merged fact set
Usage¶
from amplihack.agents.goal_seeking.learning_agent import LearningAgent
from amplihack.agents.goal_seeking.hive_mind.hive_graph import InMemoryHiveGraph
hive = InMemoryHiveGraph("shared")
hive.register_agent("agent_a")
agent = LearningAgent(
agent_name="agent_a",
storage_path=Path("/tmp/agent_a"),
use_hierarchical=True,
hive_store=hive, # Enables auto-promotion + hive retrieval
)
Key Design Decisions¶
- Auto-promotion on store: Every
store_fact()call auto-promotes to hive. Simpler than explicit promotion — no facts missed, no extra caller code. - Local-first merge: Local facts take priority over hive facts in dedup. Agents trust their own extractions more than shared knowledge.
- Silent failure: Hive promotion errors are logged but never raised. Local storage always succeeds even if the hive is unavailable.
See TUTORIAL.md in this directory for getting started.