Skip to content

Distributed Hive Mind — Design Document

Master Issue: #2710 PR: #2717

Problem Statement

Goal-seeking agents generated by the Goal Agent Generator operate with isolated memory. Each agent has its own Kuzu graph DB scoped by agent_id. When multiple agents work on related tasks, they cannot share discoveries, leading to duplicated effort and missed cross-domain insights.

Solution: Layered Hive Mind Architecture

The Unified Hive Mind composes four independent mechanisms into a layered architecture where each layer solves a distinct problem:

┌─────────────────────────────────────────────────────┐
│  Layer 4: QUERY                                     │
│  Content-hash deduplication across all sources       │
│  Keyword + topic retrieval, merged result sets       │
├─────────────────────────────────────────────────────┤
│  Layer 3: DISCOVERY (Gossip Protocol)               │
│  Periodic top-K fact sharing for "unknown unknowns"  │
│  Lamport clocks, configurable fanout                 │
├─────────────────────────────────────────────────────┤
│  Layer 2: TRANSPORT (Event Bus)                     │
│  FACT_PROMOTED events propagated to peers            │
│  Append-only event log for audit trail               │
├─────────────────────────────────────────────────────┤
│  Layer 1: STORAGE (Hierarchical Graph)              │
│  Local subgraph (private) + Hive subgraph (shared)   │
│  Promotion with configurable consensus policy        │
└─────────────────────────────────────────────────────┘

Why Four Layers?

Each layer addresses a distinct concern that the others cannot:

Layer Concern Without It
Storage Where do shared facts live? No persistence or access control
Transport How do promotions propagate? Agents must poll for changes
Discovery How do agents find facts they didn't know to look for? Only query-based retrieval
Query How are results from all layers merged? Duplicate facts in results

API

Quick Start

from amplihack.agents.goal_seeking.hive_mind.unified import (
    UnifiedHiveMind,
    HiveMindAgent,
    HiveMindConfig,
)

# Create hive with default config
hive = UnifiedHiveMind()

# Register agents
hive.register_agent("agent_a")
hive.register_agent("agent_b")

# Convenience wrappers
alice = HiveMindAgent("agent_a", hive)
bob = HiveMindAgent("agent_b", hive)

# Alice learns a fact (stored locally)
alice.learn("PostgreSQL runs on port 5432", confidence=0.95, tags=["infra"])

# Alice promotes her best fact to the hive
alice.promote("PostgreSQL runs on port 5432", confidence=0.95, tags=["infra"])

# Bob can now find Alice's promoted fact
results = bob.ask("What port does PostgreSQL use?")
# → [{"content": "PostgreSQL runs on port 5432", ...}]

# Gossip spreads unpromoted facts too
hive.run_gossip_round()
hive.process_events()

Configuration

config = HiveMindConfig(
    promotion_confidence_threshold=0.7,  # Min confidence to promote
    promotion_consensus_required=2,       # Agents must agree
    gossip_interval_rounds=5,            # Auto-gossip every N rounds
    gossip_top_k=10,                     # Facts per gossip message
    gossip_fanout=2,                     # Peers per gossip round
    event_relevance_threshold=0.3,       # Min relevance to incorporate
    enable_gossip=True,                  # Toggle gossip layer
    enable_events=True,                  # Toggle event layer
)
hive = UnifiedHiveMind(config)

Experiment Results

Five experiments were conducted, each testing a different approach:

# Approach Tests Overall Score
1 Shared Blackboard 31 47%
2 Event-Sourced 47 49%
3 Gossip Protocol 42 54%
4 Hierarchical Graph 54 57%
5 Unified (combined) 40 94%

The unified approach outperforms the best individual experiment by +37 percentage points, confirming that the mechanisms are complementary.

Detailed Metrics

Experiment 1 (Blackboard): +75pp cross-agent recall. Simple but no access control — all facts immediately shared. Good for small agent groups.

Experiment 2 (Event-Sourced): +65pp cross-domain quality. Complete audit trail via append-only event log. 0.013ms publish latency. Late joiner replay in 0.02ms.

Experiment 3 (Gossip): >95% knowledge convergence in 7 rounds for 5 agents. Weighted random sampling ensures all facts eventually propagate. Scales sub-linearly for small networks.

Experiment 4 (Hierarchical): +4.2pp with zero local regression. Most conservative — only high-confidence facts with consensus are promoted. Best autonomy preservation.

Experiment 5 (Unified): 100% local, 100% cross-domain, 81% combined = 94% overall. Composes all four layers for best-of-all-worlds.

Module Reference

hive_mind/unified.py — Unified Hive Mind

  • HiveMindConfig — Configuration dataclass
  • UnifiedHiveMind — Main orchestrator composing all layers
  • HiveMindAgent — Per-agent convenience wrapper

hive_mind/blackboard.py — Shared Blackboard

  • SharedFact — Fact dataclass with content hash
  • HiveMemoryStore — Shared fact CRUD with dedup
  • HiveMemoryBridge — Local ↔ shared bridge
  • HiveRetrieval — MemoryAgent-compatible strategy
  • MultiAgentHive — Agent registry + coordinator

hive_mind/event_sourced.py — Event Sourcing

  • HiveEvent — Immutable event dataclass
  • HiveEventBus — Thread-safe pub/sub
  • EventLog — Append-only log with persistence
  • EventSourcedMemory — Memory + event publishing
  • HiveOrchestrator — Event bus coordinator

hive_mind/gossip.py — Gossip Protocol

  • GossipFact / GossipMessage — Gossip data types
  • GossipProtocol — Per-agent gossip logic
  • GossipNetwork — Network coordinator
  • GossipMemoryAdapter — Memory store bridge

hive_mind/hierarchical.py — Hierarchical Graph

  • HiveFact / LocalFact — Two-level fact types
  • PromotionPolicy — Configurable promotion rules
  • PromotionManager — Propose/vote/promote lifecycle
  • PullManager — Hive query + pull to local
  • HierarchicalKnowledgeGraph — Two-level orchestrator

Current State

All five original "future work" items have been implemented:

  1. Real Kuzu Integration — Done. Each agent owns a Kuzu DB via KuzuGraphStore.
  2. LearningAgent Bridge — Done. FederatedGraphStore composes local + hive.
  3. Full Eval Harness — Done. 1000-turn eval with 12 agents across 5 hive federation.
  4. Distributed Mode — Done. EventBus with Local/Redis/Azure Service Bus backends.
  5. HiveGraph Protocol — Done. Swappable backends (InMemory, PeerHive with Raft).

CognitiveAdapter Hive Integration

The bridge between LearningAgent and the hive mind is in CognitiveAdapter (src/amplihack/agents/goal_seeking/cognitive_adapter.py).

How Facts Flow

LearningAgent.learn_from_content(content)
  → LLM extracts structured facts
  → CognitiveAdapter.store_fact(context, fact, confidence)
    → Stores in local Kuzu DB (CognitiveMemory.store_fact)
    → _promote_to_hive() — auto-promotes to shared hive if connected
      → hive.promote_fact(agent_name, HiveFact(...))

LearningAgent.answer_question(question)
  → CognitiveAdapter.search(query) or get_all_facts()
    → Queries local Kuzu DB
    → _search_hive(query) — queries shared hive
    → _merge_results() — deduplicates, local facts prioritized
  → LLM synthesizes answer from merged fact set

Usage

from amplihack.agents.goal_seeking.learning_agent import LearningAgent
from amplihack.agents.goal_seeking.hive_mind.hive_graph import InMemoryHiveGraph

hive = InMemoryHiveGraph("shared")
hive.register_agent("agent_a")

agent = LearningAgent(
    agent_name="agent_a",
    storage_path=Path("/tmp/agent_a"),
    use_hierarchical=True,
    hive_store=hive,  # Enables auto-promotion + hive retrieval
)

Key Design Decisions

  1. Auto-promotion on store: Every store_fact() call auto-promotes to hive. Simpler than explicit promotion — no facts missed, no extra caller code.
  2. Local-first merge: Local facts take priority over hive facts in dedup. Agents trust their own extractions more than shared knowledge.
  3. Silent failure: Hive promotion errors are logged but never raised. Local storage always succeeds even if the hive is unavailable.

See TUTORIAL.md in this directory for getting started.