Agent Type Memory Sharing Patterns in Graph Databases¶
Executive Summary¶
This document catalogs patterns for agent type memory sharing in multi-agent systems using graph databases. Research shows that temporal knowledge graphs with hybrid architectures (vector + key-value + graph) are becoming the standard for 2024-2025 multi-agent systems, with sophisticated mechanisms for shared memory, conflict resolution, and quality control.
Key Finding: Agents of the same type (e.g., all architect agents) should share memory through a temporally-aware knowledge graph with explicit quality scoring, conflict resolution mechanisms, and deprecation strategies.
1. Memory Sharing Patterns in Multi-Agent Systems¶
1.1 Hierarchical Memory Architecture¶
Modern multi-agent systems use three-tier memory inspired by operating system design:
┌─────────────────────────────────┐
│ Working Memory (Active) │ ← Short-term, context-specific
│ - Current session context │
│ - Active task state │
└─────────────────────────────────┘
↓ ↑
┌─────────────────────────────────┐
│ Main Memory (Recent) │ ← Medium-term, recent history
│ - Session history │
│ - Recent interactions │
└─────────────────────────────────┘
↓ ↑
┌─────────────────────────────────┐
│ Archive (Long-term) │ ← Long-term, persistent knowledge
│ - Shared type-specific memory │
│ - Cross-session knowledge │
└─────────────────────────────────┘
Application to Amplihack:
- Working Memory: Current conversation, active task context
- Main Memory: Session-specific learning, recent decisions
- Archive: Shared agent-type memory across all sessions and projects
1.2 Collaborative Memory Framework¶
Multi-agent systems implement two memory tiers:
- Private Memory: Visible only to originating agent instance
- Shared Memory: Selectively shared across agent type with access controls
Key Pattern: Bipartite graph linking agents, memory fragments, and access permissions.
[Agent Instance] --contributes--> [Memory Fragment] --shared_with--> [Agent Type]
↓
[Quality Score]
[Temporal Metadata]
[Access Control]
1.3 Consensus Memory Pattern¶
Definition: Memory verified and agreed upon by multiple agents of the same type.
Implementation:
- Multiple agents encounter same pattern independently
- System creates "consensus score" based on agreement frequency
- Higher consensus = higher confidence in shared memory
Example for Architect Agents:
Pattern: "Prefer composition over inheritance for extensibility"
- Architect Agent A discovers this (confidence: 0.7)
- Architect Agent B independently confirms (confidence: 0.8)
- Architect Agent C validates in different context (confidence: 0.75)
→ Consensus Score: 0.83, marked as "high confidence shared knowledge"
1.4 Retrospective Log Pattern¶
After each project or complex task, agents store:
- What went well
- What issues arose
- How they were resolved
- Quality score of the solution
This enables collective intelligence where the team becomes "smarter" over time.
2. Agent Type Classification and Granularity¶
2.1 Agent Type Taxonomy¶
Based on research and amplihack's existing structure, agent types follow a hierarchical taxonomy:
┌─────────────────────────────────────────────────────────────┐
│ Agent Hierarchy │
├─────────────────────────────────────────────────────────────┤
│ │
│ Core Role Types (Primary Classification) │
│ ├── Architect │
│ ├── Builder │
│ ├── Reviewer │
│ ├── Tester │
│ ├── Optimizer │
│ └── Security │
│ │
│ Specialized Types (Domain-Specific) │
│ ├── Database │
│ ├── API Designer │
│ ├── Integration │
│ └── Analyzer │
│ │
│ Workflow Types (Process-Oriented) │
│ ├── Pre-commit Diagnostic │
│ ├── CI Diagnostic │
│ ├── Fix Agent │
│ └── Cleanup │
│ │
│ Language/Technology Specialists │
│ ├── Rust Programming Expert │
│ ├── Azure Kubernetes Expert │
│ └── [Other language-specific agents] │
│ │
└─────────────────────────────────────────────────────────────┘
2.2 Five Key Axes for Classification¶
Research identifies five dimensions for agent taxonomy:
- Control Hierarchy: Leader vs worker agents
- Information Flow: Top-down, bottom-up, peer-to-peer
- Role and Task Delegation: Primary responsibility areas
- Temporal Hierarchy: Order dependencies (sequential vs parallel)
- Communication Structure: How agents coordinate
2.3 Granularity Decision Framework¶
Question: Should python-builder and javascript-builder share memory?
Answer Framework:
| Consideration | Same Type Memory | Separate Type Memory |
|---|---|---|
| Core responsibility | If identical core responsibility (e.g., "building code from specs") | If fundamentally different responsibilities |
| Mental models | If approaches/patterns are transferable | If domain-specific mental models differ significantly |
| Error patterns | If failure modes are similar | If failure modes are domain-specific |
| Cross-pollination value | If learning from one helps the other | If learning from one confuses the other |
Recommendation for Amplihack:
- Share: All "Builder" agents share builder-type memory (language-agnostic patterns)
- Separate: Language-specific builders have additional language-specific memory
- Hybrid Model: Two-level memory sharing (general + specialized)
Builder (General)
├── Shared Memory: Module design, contract patterns, testing approaches
├── Python Builder
│ └── Language-specific: Python idioms, type hints, pytest patterns
├── JavaScript Builder
│ └── Language-specific: JS patterns, async/await, jest patterns
└── Rust Builder
└── Language-specific: Ownership patterns, lifetimes, cargo patterns
2.4 Agent Type Definition Criteria¶
An agent should be considered a distinct "type" when:
- Unique Core Responsibility: Has a specific, non-overlapping primary function
- Distinct Mental Model: Requires different thinking patterns or expertise
- Specialized Knowledge Domain: Operates in a specific knowledge area
- Unique Success Criteria: What "good" looks like differs from other types
- Independent Value: Provides value without requiring other agent types
Examples:
- Architect ≠ Builder: Different responsibilities, mental models, success criteria
- Architect (Python) = Architect (JavaScript): Same responsibility, transferable mental models
- Reviewer ≠ Tester: Different focuses (quality vs coverage)
3. Sharing Boundaries and Quality Control¶
3.1 What Should Be Shared¶
High-Value Shared Memory¶
Procedural Knowledge (HOW to do things):
- Successful patterns and approaches
- Step-by-step workflows that worked
- Decision frameworks and heuristics
- Common pitfall avoidance strategies
Declarative Knowledge (WHAT is true):
- Verified facts and relationships
- Project-agnostic principles
- Tool capabilities and limitations
- Performance characteristics
Meta-Knowledge (WHEN to apply):
- Context patterns for technique selection
- Success/failure indicators
- Adaptation strategies
- Exception cases
Example for Architect Agents:
{
"memory_id": "arch_mem_0042",
"type": "procedural",
"pattern": "API versioning strategy",
"content": {
"situation": "Need to evolve API without breaking clients",
"approach": "URL-based versioning with deprecation timeline",
"reasoning": "Clear, explicit, easier to route and test",
"alternatives_considered": ["Header-based", "Query parameter"],
"success_rate": 0.92,
"contexts_applied": 12
},
"quality_score": 0.89,
"contributor_agents": ["arch_001", "arch_007", "arch_015"],
"temporal_metadata": {
"created": "2025-08-15",
"last_validated": "2025-10-28",
"deprecation_date": null
}
}
3.2 What Should NOT Be Shared¶
Private/Instance-Specific Memory¶
Session Context (NOT shared):
- Current conversation history
- User preferences for this session
- Temporary working state
- Draft ideas not yet validated
Project-Specific (Scoped sharing only):
- Project-specific conventions
- Team-specific patterns
- Codebase-specific quirks
- Local environment configurations
Failed Experiments (Selective sharing):
- Share: Why it failed, lessons learned
- Don't Share: The bad solution itself as a recommendation
- Pattern: Store as "anti-pattern" with clear warning
Low-Confidence Hypotheses (NOT shared until validated):
- Unverified theories
- Single-instance observations
- Untested assumptions
3.3 Memory Quality Control Mechanisms¶
Multi-Dimensional Quality Scoring¶
Each shared memory fragment receives scores across multiple dimensions:
class MemoryQuality:
def __init__(self):
self.confidence: float = 0.0 # Agent's confidence (0-1)
self.validation: float = 0.0 # Number of successful applications
self.recency: float = 0.0 # How recent (time decay)
self.consensus: float = 0.0 # Agreement across agents
self.context_specificity: float = 0.0 # How context-dependent
self.impact: float = 0.0 # Measured improvement from applying
def compute_overall_quality(self) -> float:
"""Weighted composite quality score."""
return (
0.25 * self.confidence +
0.20 * self.validation +
0.15 * self.recency +
0.20 * self.consensus +
0.10 * self.context_specificity +
0.10 * self.impact
)
Voting and Rating System¶
Agents can rate shared memory after using it:
Agent uses shared memory → Outcome observed → Agent provides feedback
├── Helpful (+1): Memory led to successful outcome
├── Partially Helpful (+0.5): Memory was directionally correct
├── Not Helpful (0): Memory didn't apply to situation
└── Harmful (-1): Memory led to incorrect approach
Automatic Quality Decay:
- Memory quality decreases over time without validation
- Requires periodic re-validation to maintain high scores
- Old memory without recent usage gets flagged for review
Quality Thresholds for Sharing¶
Quality Score Status Action
─────────────────────────────────────────────────
0.8 - 1.0 Highly Trusted Recommend proactively
0.6 - 0.79 Trusted Available for retrieval
0.4 - 0.59 Experimental Use with caution flag
0.2 - 0.39 Low Confidence Show but warn
0.0 - 0.19 Deprecated Archive, don't recommend
3.4 Pollution Prevention Strategies¶
Pattern 1: Temporal Isolation¶
Problem: Old, outdated patterns polluting current recommendations.
Solution: Temporal metadata with validity windows.
class TemporalMemory:
valid_from: datetime
valid_until: Optional[datetime] # None = still valid
last_validation: datetime
validation_frequency: timedelta # How often to re-validate
def is_currently_valid(self) -> bool:
"""Check if memory is temporally valid."""
if self.valid_until and datetime.now() > self.valid_until:
return False
# Check if needs re-validation
if datetime.now() - self.last_validation > self.validation_frequency:
return False # Needs re-validation
return True
Example:
Memory: "Use Redux for React state management"
Valid From: 2020-01-01
Valid Until: 2024-06-01
Reason: React hooks and Context API now preferred
Status: Deprecated, replaced by: mem_0892
Pattern 2: Context Fingerprinting¶
Problem: Memory applied in wrong context causes failures.
Solution: Each memory has a "context fingerprint" describing where it applies.
class ContextFingerprint:
project_type: List[str] # ["web_app", "api_service"]
scale: str # "small", "medium", "large"
constraints: List[str] # ["high_performance", "strict_types"]
tech_stack: List[str] # ["python", "postgresql"]
team_size: str # "solo", "small_team", "large_team"
def matches(self, current_context: Context) -> float:
"""Return similarity score (0-1) between contexts."""
# Compute Jaccard similarity or other distance metric
pass
Usage:
Query: "How should I design authentication?"
Current Context: {type: "api_service", scale: "large", ...}
→ Retrieve memories with context similarity > 0.7
→ Rank by (quality_score * context_similarity)
Pattern 3: Contradiction Detection¶
Problem: Multiple agents contribute contradictory patterns.
Solution: Explicit contradiction tracking and resolution.
class ContradictionDetector:
def detect_contradictions(self, new_memory: Memory,
existing_memories: List[Memory]) -> List[Contradiction]:
"""Detect if new memory contradicts existing ones."""
contradictions = []
for existing in existing_memories:
if self.semantic_similarity(new_memory, existing) > 0.8:
if self.recommendations_conflict(new_memory, existing):
contradictions.append(
Contradiction(
memory1=existing,
memory2=new_memory,
conflict_type="recommendation",
resolution_strategy="highest_quality_wins"
)
)
return contradictions
Pattern 4: Contributor Reputation¶
Problem: Some agents produce higher quality memories than others.
Solution: Track per-agent contribution quality and weight accordingly.
class AgentReputation:
agent_id: str
contributions: int
avg_quality: float # Average quality of contributions
validation_rate: float # % of contributions validated by others
deprecation_rate: float # % of contributions later deprecated
def contribution_weight(self) -> float:
"""How much to trust this agent's contributions."""
return (
0.4 * self.avg_quality +
0.3 * self.validation_rate +
0.3 * (1.0 - self.deprecation_rate)
)
Application:
- High-reputation agents' memories start with higher initial quality
- Low-reputation agents' memories require more validation
- Reputation scores are transparent and updateable
4. Conflict Resolution Strategies¶
4.1 Types of Conflicts¶
Type 1: Temporal Conflicts (Same Pattern, Different Times)¶
Scenario: Two architect agents recommend different approaches for the same problem at different times.
Example:
Resolution Strategy: Temporal Priority
- Newer memory takes precedence (with quality threshold)
- Older memory marked as "superseded_by: B"
- Keep historical record for learning
Type 2: Contextual Conflicts (Different Contexts, Different Solutions)¶
Scenario: Same problem, different solutions based on context.
Example:
Memory A: "Use microservices for scalability" (context: large_team, high_scale)
Memory B: "Use monolith for simplicity" (context: small_team, low_scale)
Resolution Strategy: Context-Based Selection
- Not a true conflict - both can be correct in their contexts
- Retrieve based on context similarity
- Present both with context explanations
Type 3: Direct Contradictions (Same Context, Opposite Recommendations)¶
Scenario: Two agents genuinely disagree on the best approach.
Example:
Memory A: "Always use ORMs for database access" (quality: 0.75)
Memory B: "Use raw SQL for performance-critical queries" (quality: 0.82)
Resolution Strategy: Quality-Weighted Consensus
- If quality difference > 0.1, higher quality wins
- If quality similar, flag as "debate" and present both
- Invoke multi-agent debate mechanism to resolve
- Store resolution outcome as new consensus memory
4.2 Conflict Resolution Decision Tree¶
New Memory Submitted
↓
Does it contradict existing memory?
├── No → Add with initial quality score
↓
Yes → What type of conflict?
├── Temporal
│ → Is new memory significantly better quality?
│ ├── Yes → Supersede old memory
│ └── No → Keep both, flag temporal change
│
├── Contextual
│ → Context fingerprints differ?
│ ├── Yes → Store both, different contexts
│ └── No → Treat as Direct Contradiction
│
└── Direct Contradiction
→ Quality difference > threshold?
├── Yes → Higher quality wins
└── No → Invoke debate mechanism
→ Generate consensus memory
4.3 Conflict Resolution Mechanisms¶
Mechanism 1: Automatic Resolution (70% of cases)¶
Criteria for Automatic Resolution:
- Clear quality difference (>0.15)
- Temporal supersession with validation
- Contextual separation obvious
Implementation:
class AutomaticResolver:
def resolve(self, conflict: Conflict) -> Resolution:
# Temporal resolution
if conflict.type == "temporal":
if conflict.newer.quality > conflict.older.quality * 0.9:
return Resolution(
action="supersede",
winner=conflict.newer,
superseded=conflict.older
)
# Quality-based resolution
quality_diff = abs(conflict.memoryA.quality - conflict.memoryB.quality)
if quality_diff > 0.15:
winner = max(conflict.memoryA, conflict.memoryB,
key=lambda m: m.quality)
return Resolution(action="quality_wins", winner=winner)
# Cannot auto-resolve
return Resolution(action="escalate", reason="ambiguous")
Mechanism 2: Multi-Agent Debate (25% of cases)¶
When to Use: Quality similar, both valid arguments, high-impact decision.
Process:
- Present both conflicting memories to fresh agent instances
- Each argues for its position
- Agents vote on resolution
- Create new consensus memory incorporating insights from both
Example Debate Structure:
Topic: "Should we use microservices or monolith for this service?"
Memory A Position (Microservices):
- Supports: Independent scaling, team autonomy
- Evidence: 12 successful applications
- Quality: 0.78
Memory B Position (Monolith):
- Supports: Simplicity, easier debugging
- Evidence: 15 successful applications
- Quality: 0.81
Debate Outcome:
- Context matters most
- Decision tree created:
→ If team < 5 AND scale < 100K users: Monolith
→ If team > 10 OR scale > 1M users: Microservices
→ If 5-10 team AND uncertain scale: Modular monolith
New Consensus Memory:
- Incorporates both perspectives
- Context-driven decision framework
- Quality: 0.85 (validated by debate)
Mechanism 3: Human Escalation (5% of cases)¶
When to Use: Fundamental disagreement, insufficient data, high risk.
Process:
- System detects unresolvable conflict
- Generate escalation report with both positions
- Request human expert judgment
- Record decision and reasoning as high-quality memory
4.4 Maintaining Conflict History¶
Why Track Conflicts:
- Learn from resolution patterns
- Improve automatic resolution
- Identify systematic disagreements
- Understand knowledge evolution
Conflict Registry Schema:
{
"conflict_id": "conf_0123",
"type": "direct_contradiction",
"memories": ["mem_0042", "mem_0089"],
"detected_at": "2025-10-30T14:23:00Z",
"resolution": {
"method": "debate",
"outcome": "consensus_created",
"new_memory": "mem_0124",
"rationale": "Both valid in different contexts, created decision framework"
},
"lessons_learned": ["Context fingerprinting was too coarse", "Need scale dimension in context"]
}
5. Graph Schema Patterns for Shared Memory¶
5.1 Core Graph Schema¶
Node Types¶
// Agent Type Definition
CREATE (at:AgentType {
type_id: "architect",
name: "Architect",
description: "System design and architecture specialist",
core_responsibility: "Design specifications and system architecture",
created_at: datetime()
})
// Agent Instance
CREATE (ai:AgentInstance {
instance_id: "arch_inst_001",
type: "architect",
session_id: "session_20251030_142300",
created_at: datetime(),
reputation_score: 0.85
})
// Memory Fragment
CREATE (m:Memory {
memory_id: "mem_0042",
type: "procedural", // procedural, declarative, meta
category: "design_pattern",
content: "...",
quality_score: 0.89,
created_at: datetime(),
last_validated: datetime(),
validation_count: 12,
application_count: 47,
success_rate: 0.92
})
// Context Fingerprint
CREATE (ctx:Context {
context_id: "ctx_0015",
project_type: ["web_app", "api_service"],
scale: "medium",
tech_stack: ["python", "postgresql"],
constraints: ["high_performance"]
})
// Quality Metrics
CREATE (qm:QualityMetrics {
metrics_id: "qm_0042",
confidence: 0.85,
validation: 0.90,
recency: 0.95,
consensus: 0.88,
overall: 0.89
})
Relationship Types¶
// Agent Type owns Memory
CREATE (at:AgentType {type_id: "architect"})-[:OWNS_MEMORY {
shared: true,
access_level: "read_write"
}]->(m:Memory {memory_id: "mem_0042"})
// Agent Instance contributes Memory
CREATE (ai:AgentInstance {instance_id: "arch_inst_001"})-[:CONTRIBUTED {
contributed_at: datetime(),
initial_quality: 0.75
}]->(m:Memory {memory_id: "mem_0042"})
// Memory applies in Context
CREATE (m:Memory {memory_id: "mem_0042"})-[:APPLIES_IN {
similarity_required: 0.7
}]->(ctx:Context {context_id: "ctx_0015"})
// Memory supersedes old Memory
CREATE (m_new:Memory {memory_id: "mem_0089"})-[:SUPERSEDES {
superseded_at: datetime(),
reason: "Better approach with hooks"
}]->(m_old:Memory {memory_id: "mem_0042"})
// Memory contradicts another Memory
CREATE (m1:Memory {memory_id: "mem_0042"})-[:CONTRADICTS {
conflict_type: "direct",
detected_at: datetime(),
resolution: "debate"
}]->(m2:Memory {memory_id: "mem_0089"})
// Memory validated by Agent Instance
CREATE (ai:AgentInstance {instance_id: "arch_inst_007"})-[:VALIDATED {
validated_at: datetime(),
outcome: "successful",
feedback_score: 0.9
}]->(m:Memory {memory_id: "mem_0042"})
// Agent Instance uses Memory
CREATE (ai:AgentInstance {instance_id: "arch_inst_015"})-[:USED {
used_at: datetime(),
context: "ctx_0020",
outcome: "successful"
}]->(m:Memory {memory_id: "mem_0042"})
5.2 Hybrid Architecture Pattern¶
Modern systems use three-store hybrid:
┌─────────────────────────────────────────────────────────┐
│ Hybrid Memory Store │
├─────────────────────────────────────────────────────────┤
│ │
│ 1. VECTOR STORE (Semantic Search) │
│ └─> Embeddings of memory content │
│ └─> Fast similarity search │
│ └─> Find relevant memories by meaning │
│ │
│ 2. KEY-VALUE STORE (Fast Lookup) │
│ └─> Agent states, sessions │
│ └─> Recent interaction history │
│ └─> Cache for hot memories │
│ │
│ 3. GRAPH STORE (Relationships) │
│ └─> Memory relationships │
│ └─> Agent type hierarchies │
│ └─> Temporal and conflict tracking │
│ └─> Context fingerprints │
│ │
└─────────────────────────────────────────────────────────┘
Implementation Pattern:
class HybridMemoryStore:
def __init__(self):
self.vector_store = VectorStore() # For semantic search
self.kv_store = KeyValueStore() # For fast lookup
self.graph_store = GraphStore() # For relationships
async def store_memory(self, memory: Memory):
"""Store memory across all three stores."""
# 1. Vector store - for semantic search
embedding = await self.embed_content(memory.content)
await self.vector_store.store(memory.memory_id, embedding)
# 2. Key-value store - for fast retrieval
await self.kv_store.set(memory.memory_id, memory.to_dict())
# 3. Graph store - for relationships
await self.graph_store.create_node(memory)
await self.graph_store.create_relationships(memory)
async def retrieve_relevant(self, query: str,
context: Context,
agent_type: str) -> List[Memory]:
"""Retrieve relevant memories using hybrid approach."""
# Step 1: Semantic search for candidates (vector store)
embedding = await self.embed_content(query)
candidates = await self.vector_store.similarity_search(
embedding,
limit=50
)
# Step 2: Filter by agent type and context (graph store)
filtered = await self.graph_store.filter_by_agent_type_and_context(
memory_ids=candidates,
agent_type=agent_type,
context=context
)
# Step 3: Fetch full details (key-value store)
memories = []
for memory_id in filtered:
memory_data = await self.kv_store.get(memory_id)
memories.append(Memory.from_dict(memory_data))
# Step 4: Rank by quality score and recency
return sorted(memories,
key=lambda m: m.quality_score * m.recency_score,
reverse=True)
5.3 Temporal Graph Patterns¶
Key Innovation: Every relationship has temporal metadata.
// Temporal relationship pattern
CREATE (m:Memory {memory_id: "mem_0042"})-[r:SUPERSEDES {
valid_from: datetime('2025-01-01'),
valid_until: datetime('2025-10-01'),
superseded_at: datetime('2025-10-01'),
reason: "React hooks replaced Redux patterns"
}]->(old_m:Memory {memory_id: "mem_0012"})
// Query: Get currently valid memories
MATCH (at:AgentType {type_id: "architect"})-[:OWNS_MEMORY]->(m:Memory)
WHERE m.valid_until IS NULL
OR m.valid_until > datetime()
RETURN m
ORDER BY m.quality_score DESC
// Query: Get memory state at specific time
MATCH (at:AgentType {type_id: "architect"})-[:OWNS_MEMORY]->(m:Memory)
WHERE m.valid_from <= datetime('2024-06-01')
AND (m.valid_until IS NULL OR m.valid_until > datetime('2024-06-01'))
RETURN m
5.4 Schema for Cross-Project Learning¶
Pattern: Scope memory sharing by project + agent type combination.
// Project node
CREATE (p:Project {
project_id: "proj_amplihack",
name: "Amplihack Framework",
type: "framework"
})
// Memory scoped to project
CREATE (m:Memory {memory_id: "mem_0042"})-[:SCOPED_TO {
scope_type: "project_specific",
created_in: "proj_amplihack"
}]->(p:Project)
// Memory applicable across all projects
CREATE (m:Memory {memory_id: "mem_0089"})-[:SCOPED_TO {
scope_type: "universal",
applicable_to: "all"
}]->(ag:AgentType {type_id: "architect"})
// Query: Get memories for current project + universal
MATCH (at:AgentType {type_id: "architect"})-[:OWNS_MEMORY]->(m:Memory)
WHERE (m)-[:SCOPED_TO]->(:Project {project_id: $current_project})
OR (m)-[:SCOPED_TO {scope_type: "universal"}]->(at)
RETURN m
ORDER BY m.quality_score DESC
Trade-offs:
| Approach | Pros | Cons |
|---|---|---|
| Global Sharing | Maximum learning, largest data set | Risk of irrelevant/conflicting memories |
| Project-Scoped | Highly relevant, context-appropriate | Limited learning, slower improvement |
| Hybrid (Recommended) | Best of both worlds | More complex to implement |
Recommendation: Hybrid approach with two memory pools per agent type:
- Universal Pool: Project-agnostic patterns (design principles, general approaches)
- Project Pool: Project-specific patterns (codebase conventions, team patterns)
6. Example Scenarios¶
Scenario 1: New Architect Agent Learning from Shared Memory¶
Context: New architect agent instance started for API design task.
Process:
# 1. Agent identifies its type and task context
agent = AgentInstance(type="architect", task="api_design")
context = Context(
project_type=["api_service"],
scale="medium",
tech_stack=["python", "fastapi"],
constraints=["rest_api", "authentication"]
)
# 2. Query shared memory for relevant patterns
query = "How to design authentication for REST API?"
relevant_memories = await memory_store.retrieve_relevant(
query=query,
context=context,
agent_type="architect"
)
# 3. Memories returned (sorted by relevance)
"""
Memory 1 (Quality: 0.92):
- Pattern: JWT-based authentication with refresh tokens
- Context: REST API, medium scale, stateless requirement
- Validated: 15 successful applications
- Contributed by: 8 architect agents
- Last used: 2025-10-15
Memory 2 (Quality: 0.85):
- Pattern: OAuth2 with third-party providers
- Context: REST API, user-facing, social login needed
- Validated: 12 successful applications
- Warning: More complex setup, consider if needed
Memory 3 (Quality: 0.71):
- Pattern: Session-based authentication
- Context: REST API, small scale, simple requirements
- Note: Consider for simpler use cases only
"""
# 4. Agent uses highest quality memory
agent.apply_memory(relevant_memories[0])
# 5. Agent provides feedback after implementation
await memory_store.record_usage(
agent_id=agent.instance_id,
memory_id=relevant_memories[0].memory_id,
outcome="successful",
feedback_score=0.95,
notes="Worked perfectly, clear implementation guide"
)
# 6. Memory quality updated based on successful application
relevant_memories[0].quality_score += 0.01 # Incremental improvement
relevant_memories[0].application_count += 1
relevant_memories[0].success_rate = (
(relevant_memories[0].success_rate * 15 + 1.0) / 16
)
Scenario 2: Detecting and Resolving Contradictory Patterns¶
Context: Two builder agents contribute different approaches to error handling.
Timeline:
Day 1: Builder Agent A contributes memory
Memory A:
- Pattern: "Always raise exceptions for errors"
- Reasoning: "Explicit error handling, caller decides what to do"
- Quality: 0.70 (initial, one contribution)
Day 30: Builder Agent B contributes contradictory memory
Memory B:
- Pattern: "Return Result type (Ok/Err) instead of exceptions"
- Reasoning: "More explicit in function signatures, better for async"
- Quality: 0.72 (initial, one contribution)
System detects contradiction:
- Semantic similarity: 0.85 (high - both about error handling)
- Recommendations conflict: True
- Quality difference: 0.02 (too small for auto-resolve)
Resolution: Multi-agent debate initiated
Debate Participants: 3 builder agents
- Agent X: Supports exceptions (Memory A)
- Agent Y: Supports Result types (Memory B)
- Agent Z: Neutral arbiter
Debate Outcome:
- Both valid in different contexts
- Exceptions: Better for Python, synchronous code
- Result types: Better for async, typed languages
Consensus Memory Created:
Memory C:
- Pattern: "Context-dependent error handling strategy"
- Decision tree:
→ Python + sync code: Exceptions (Memory A)
→ Python + async code: Result types if typed, else exceptions
→ Typed languages (Rust, TypeScript): Result types (Memory B)
- Incorporates both original memories
- Quality: 0.88 (consensus, validated by debate)
- Supersedes: Memory A, Memory B (marked as "merged into Memory C")
Scenario 3: Memory Deprecation Due to Technology Evolution¶
Context: React architecture patterns evolve over time.
Timeline:
2020-01-01: Memory Created
Memory: "Use Redux for state management in React apps"
- Quality: 0.85
- Validations: 25 successful applications
- Status: Active
2023-06-01: New approach emerges
- React Context API + hooks become standard
- Redux still used but for complex cases only
- No automatic deprecation (quality still high from past success)
2024-03-15: Quality starts declining
- New validations: 5 applications
- Feedback scores: Average 0.6 (declining)
- Comments: "Too complex for this use case", "Context API simpler"
- System notices trend: success_rate declining
2024-06-01: New memory contributed
Memory B: "Use React Context + hooks for state management"
- Initial quality: 0.75
- Rapidly gains validations: 15 in 3 months
- High feedback scores: Average 0.92
2024-09-01: Automatic conflict detection
- System detects potential supersession
- Memory B quality (0.83) approaching Memory A (0.78, declining)
- Contexts overlap significantly
2024-10-01: Automatic resolution triggered
- Memory A marked as "superseded"
- Memory B becomes primary recommendation
- Memory A retained with note: "Historical - use for complex state only"
- New context added to Memory A: {complexity: "high", state_size: "large"}
Query Impact:
# Before deprecation (2024-06-01)
query = "How to manage state in React app?"
→ Returns Memory A (Redux) as top result
# After deprecation (2024-10-01)
query = "How to manage state in React app?"
→ Returns Memory B (Context + hooks) as top result
→ Shows Memory A as "also consider for complex cases"
Scenario 4: Cross-Agent-Type Learning (Indirect Sharing)¶
Context: Optimizer agents learn from Builder agents' performance patterns.
Pattern: While agent types don't directly share memory, they can reference patterns from other types.
# Builder memory (not shared with Optimizer directly)
builder_memory = Memory(
agent_type="builder",
memory_id="builder_mem_042",
pattern="Use list comprehension instead of map() for readability",
performance_note="Also 10-15% faster in benchmarks"
)
# Optimizer discovers this pattern through observation
optimizer_agent = AgentInstance(type="optimizer")
# Optimizer creates reference in its own memory
optimizer_memory = Memory(
agent_type="optimizer",
memory_id="opt_mem_089",
pattern="List comprehensions outperform map() in Python",
benchmarks={
"list_comp": "0.42ms",
"map_func": "0.48ms",
"improvement": "14%"
},
references=[
Reference(
agent_type="builder",
memory_id="builder_mem_042",
relationship="supports"
)
]
)
# Cross-reference relationship in graph
"""
(optimizer:Memory {memory_id: "opt_mem_089"})
-[:SUPPORTS]->
(builder:Memory {memory_id: "builder_mem_042"})
"""
Benefit: Different agent types maintain their own perspectives while building on each other's discoveries.
7. Anti-Patterns to Avoid¶
Anti-Pattern 1: Over-Sharing Without Context¶
Problem: Sharing all memories universally without context leads to:
- Irrelevant recommendations
- Conflicting advice
- Decision paralysis (too many options)
Example:
Bad: All architect agents share all design decisions
Result: Microservices patterns suggested for tiny projects
Monolith patterns suggested for massive distributed systems
Good: Context-scoped sharing
Result: Patterns matched to project scale, team size, constraints
Solution: Always attach context fingerprints to shared memory.
Anti-Pattern 2: Ignoring Temporal Decay¶
Problem: Old patterns remain "trusted" without re-validation.
Example:
2018 Memory: "Use Angular for web apps" (quality: 0.90)
2025 Query: "What framework for web app?"
→ Returns outdated Angular recommendation despite React/Vue dominance
Problem: Quality score frozen in time
Solution: Implement automatic quality decay and re-validation requirements.
Anti-Pattern 3: Quality Score Inflation¶
Problem: Positive feedback inflates scores indefinitely without bounds.
Example:
Memory starts at quality 0.75
After 100 successful uses: quality = 0.99
After 1000 uses: quality = 1.00
→ Becomes "untouchable", never questioned or updated
Solution: Use logarithmic quality updates or bounded scoring with decay.
def update_quality_bounded(current: float, feedback: float,
count: int) -> float:
"""Update quality with diminishing returns."""
weight = 1.0 / math.log(count + 2) # Diminishing weight
new_quality = current * (1 - weight) + feedback * weight
return min(new_quality, 0.95) # Cap at 0.95, always room for improvement
Anti-Pattern 4: Singleton Memory (No Alternatives)¶
Problem: Only one "best practice" stored, no alternatives presented.
Example:
Memory: "Always use PostgreSQL for database"
→ No alternatives stored
→ Agent applies PostgreSQL even when Redis or SQLite would be better
Solution: Store multiple validated approaches with clear context differentiation.
Memory A: PostgreSQL (context: relational data, ACID requirements)
Memory B: Redis (context: caching, high-throughput reads)
Memory C: SQLite (context: embedded, single-user, simple)
Anti-Pattern 5: Black Box Memory (No Reasoning)¶
Problem: Storing "what to do" without "why" or "when not to".
Example:
Bad Memory:
"Use Redis for caching"
→ Agent applies blindly without understanding trade-offs
Good Memory:
Pattern: "Use Redis for caching"
When: High read volume, data can tolerate eventual consistency
Why: In-memory speed, built-in expiration, distributed support
When NOT: Strong consistency needed, data too large for memory, cost constraints
Alternatives: Memcached (simpler), CDN (static content), database query cache
Solution: Structured memory with reasoning, context, trade-offs, and alternatives.
Anti-Pattern 6: No Failure Memory¶
Problem: Only storing successes, not learning from failures.
Example:
System: "Let's try X pattern"
Outcome: Failed due to Y reason
Memory: [Nothing stored]
→ Different agent makes same mistake later
Solution: Store failure patterns explicitly as "anti-patterns" or "lessons learned".
{
"memory_type": "anti_pattern",
"pattern": "Using ORM for bulk inserts",
"failure_mode": "Extremely slow, generated N queries instead of bulk",
"lesson": "Use raw SQL or bulk insert API for >1000 records",
"evidence": "Performance degraded from 2s to 45s with 5000 records",
"quality_score": 0.88,
"validation_count": 7 // 7 agents confirmed this anti-pattern
}
Anti-Pattern 7: Memory Hoarding (No Cleanup)¶
Problem: Never deleting or archiving old, obsolete memories.
Example:
Database grows to millions of memories
Retrieval becomes slow
Irrelevant old patterns dilute search results
Solution: Implement memory lifecycle management.
class MemoryLifecycle:
def archive_old_memories(self):
"""Move old, unused memories to archive."""
# Archive if not used in 6 months AND quality < 0.5
cutoff = datetime.now() - timedelta(days=180)
old_memories = query("""
MATCH (m:Memory)
WHERE m.last_used < $cutoff
AND m.quality_score < 0.5
RETURN m
""", cutoff=cutoff)
for memory in old_memories:
self.archive(memory) # Move to archive storage
self.update_status(memory, "archived")
def delete_deprecated(self):
"""Remove memories explicitly marked for deletion."""
# Delete if deprecated > 1 year ago AND superseded
cutoff = datetime.now() - timedelta(days=365)
deprecated = query("""
MATCH (m:Memory {status: "deprecated"})
WHERE m.deprecated_at < $cutoff
AND EXISTS((m)-[:SUPERSEDED_BY]->())
RETURN m
""", cutoff=cutoff)
for memory in deprecated:
self.delete(memory)
8. Implementation Recommendations for Amplihack¶
8.1 Phased Rollout Strategy¶
Phase 1: Foundation (Weeks 1-2)¶
Objectives:
- Implement basic graph schema
- Set up hybrid storage (vector + graph)
- Create agent type taxonomy
Deliverables:
- Neo4j or FalkorDB graph database setup
- Vector store integration (e.g., Pinecone, Qdrant, or pgvector)
- Core node/relationship schema implemented
- Basic memory CRUD operations
Code Example:
# /src/amplihack/memory/graph_store.py
class GraphMemoryStore:
def __init__(self, neo4j_uri: str, vector_store: VectorStore):
self.graph = Neo4jDriver(neo4j_uri)
self.vector_store = vector_store
async def store_memory(self, memory: Memory, agent_type: str):
"""Store memory in both graph and vector stores."""
# Graph storage
await self.graph.create_node("Memory", memory.to_dict())
await self.graph.create_relationship(
"AgentType", {"type_id": agent_type},
"OWNS_MEMORY",
"Memory", {"memory_id": memory.memory_id}
)
# Vector storage for semantic search
embedding = await self.vector_store.embed(memory.content)
await self.vector_store.store(memory.memory_id, embedding)
async def retrieve_for_agent_type(self, agent_type: str,
query: str,
context: Context,
limit: int = 10) -> List[Memory]:
"""Retrieve relevant memories for agent type."""
# Semantic search
embedding = await self.vector_store.embed(query)
candidates = await self.vector_store.similarity_search(
embedding, limit=50
)
# Graph filtering
filtered = await self.graph.query("""
MATCH (at:AgentType {type_id: $agent_type})-[:OWNS_MEMORY]->(m:Memory)
WHERE m.memory_id IN $candidates
AND m.quality_score > 0.6
AND (m.valid_until IS NULL OR m.valid_until > datetime())
RETURN m
ORDER BY m.quality_score DESC
LIMIT $limit
""", agent_type=agent_type, candidates=candidates, limit=limit)
return [Memory.from_dict(m) for m in filtered]
Phase 2: Quality Control (Weeks 3-4)¶
Objectives:
- Implement quality scoring system
- Add validation and feedback mechanisms
- Create conflict detection
Deliverables:
- Quality scoring algorithms
- Feedback collection system
- Automatic quality decay
- Basic conflict detection
Code Example:
# /src/amplihack/memory/quality.py
class QualityManager:
def compute_quality(self, memory: Memory) -> float:
"""Compute overall quality score."""
return (
0.25 * memory.confidence +
0.20 * self.compute_validation_score(memory) +
0.15 * self.compute_recency_score(memory) +
0.20 * self.compute_consensus_score(memory) +
0.10 * memory.context_specificity +
0.10 * memory.impact_score
)
def compute_recency_score(self, memory: Memory) -> float:
"""Time-based decay of quality."""
age_days = (datetime.now() - memory.last_validated).days
decay_rate = 0.01 # 1% per month
return max(0.0, 1.0 - (age_days / 30) * decay_rate)
async def record_feedback(self, memory_id: str,
agent_id: str,
feedback: Feedback):
"""Record agent feedback on memory usage."""
# Store feedback in graph
await self.graph.create_relationship(
"AgentInstance", {"instance_id": agent_id},
"VALIDATED",
"Memory", {"memory_id": memory_id},
{
"validated_at": datetime.now(),
"outcome": feedback.outcome,
"score": feedback.score
}
)
# Update memory quality
memory = await self.get_memory(memory_id)
memory.validation_count += 1
memory.quality_score = self.compute_quality(memory)
await self.update_memory(memory)
Phase 3: Conflict Resolution (Weeks 5-6)¶
Objectives:
- Implement conflict detection algorithms
- Create resolution mechanisms
- Add temporal invalidation
Deliverables:
- Conflict detection system
- Automatic resolution for simple cases
- Multi-agent debate mechanism stub
- Temporal supersession tracking
Phase 4: Context & Scoping (Weeks 7-8)¶
Objectives:
- Add context fingerprinting
- Implement project scoping
- Create context similarity matching
Deliverables:
- Context fingerprint system
- Project-scoped memory queries
- Universal vs project-specific separation
- Context-aware retrieval
Phase 5: Optimization & Production (Weeks 9-10)¶
Objectives:
- Performance optimization
- Production hardening
- Monitoring and observability
Deliverables:
- Query optimization
- Caching layer
- Metrics and monitoring
- Documentation and examples
8.2 Technology Stack Recommendations¶
Graph Database Options¶
Option 1: Neo4j (Recommended)
- Pros: Mature, excellent Cypher query language, strong community
- Cons: Commercial licensing for scale, resource intensive
- Best for: Production systems, complex queries, large scale
Option 2: FalkorDB
- Pros: Redis-based (in-memory speed), GraphQL support, open source
- Cons: Less mature than Neo4j, smaller ecosystem
- Best for: High-performance requirements, Redis infrastructure
Option 3: ArangoDB
- Pros: Multi-model (graph + document), flexible, single database
- Cons: Query language different from Cypher
- Best for: Mixed workloads, document + graph needs
Recommendation for Amplihack: Start with Neo4j for maturity and Cypher expressiveness.
Vector Store Options¶
Option 1: Qdrant
- Pros: Fast, open source, excellent Python support, filtering capabilities
- Cons: Relatively new, smaller community
- Best for: Self-hosted, full control, cost-sensitive
Option 2: Pinecone
- Pros: Managed service, scalable, simple API
- Cons: Costs at scale, vendor lock-in
- Best for: Quick start, managed infrastructure preferred
Option 3: pgvector (PostgreSQL extension)
- Pros: Same database as possibly already using, simpler stack
- Cons: Not specialized for vectors, slower at very large scale
- Best for: Existing PostgreSQL infrastructure, smaller scale
Recommendation for Amplihack: Qdrant for balance of performance, cost, and control.
8.3 Integration with Existing Amplihack Agents¶
Agent Memory Decorator Pattern¶
Approach: Wrap existing agents with memory-aware decorators.
# /src/amplihack/memory/agent_memory.py
from functools import wraps
from typing import Callable
class MemoryAwareAgent:
def __init__(self, agent_type: str, memory_store: GraphMemoryStore):
self.agent_type = agent_type
self.memory_store = memory_store
self.instance_id = generate_instance_id()
def with_memory(self, func: Callable):
"""Decorator to add memory capabilities to agent methods."""
@wraps(func)
async def wrapper(*args, **kwargs):
# Extract task context
task = kwargs.get('task') or args[0] if args else None
context = self.extract_context(task)
# Retrieve relevant memories
query = self.formulate_query(task)
memories = await self.memory_store.retrieve_for_agent_type(
agent_type=self.agent_type,
query=query,
context=context
)
# Inject memories into agent context
kwargs['relevant_memories'] = memories
# Execute agent function
result = await func(*args, **kwargs)
# Extract learnings from result
if result.get('should_store_memory'):
new_memory = self.extract_memory_from_result(result)
await self.memory_store.store_memory(
new_memory,
self.agent_type
)
return result
return wrapper
# Usage in existing agents
# /src/amplihack/agents/amplihack/core/architect.py
class ArchitectAgent:
def __init__(self, memory_store: GraphMemoryStore):
self.memory_aware = MemoryAwareAgent("architect", memory_store)
@memory_aware.with_memory
async def design_system(self, requirements: str,
relevant_memories: List[Memory] = None):
"""Design system architecture with memory assistance."""
# Use relevant_memories to inform design
design_patterns = [m for m in relevant_memories
if m.category == "design_pattern"]
# Agent logic here...
design = self.create_design(requirements, design_patterns)
# Mark valuable insights for storage
if design.is_novel:
return {
"design": design,
"should_store_memory": True,
"memory_content": design.key_decisions
}
return {"design": design}
Memory-Aware Agent Invocation¶
Update orchestrator to enable memory for agents:
# /src/amplihack/launcher/orchestrator.py
class Orchestrator:
def __init__(self):
self.memory_store = GraphMemoryStore(...)
self.agents = {
"architect": ArchitectAgent(self.memory_store),
"builder": BuilderAgent(self.memory_store),
"reviewer": ReviewerAgent(self.memory_store),
# ... other agents
}
async def invoke_agent(self, agent_type: str, task: Task):
"""Invoke agent with memory support."""
agent = self.agents[agent_type]
return await agent.execute(task)
8.4 Monitoring and Metrics¶
Key Metrics to Track:
- Memory Quality Metrics:
- Average quality score by agent type
- Quality score distribution
-
Quality decay rate
-
Usage Metrics:
- Memory retrieval frequency
- Application success rate
-
Feedback score distribution
-
Growth Metrics:
- Memories created per day
- Memories deprecated per week
-
Memory database size
-
Conflict Metrics:
- Conflicts detected per week
- Auto-resolution rate
-
Debate invocations
-
Performance Metrics:
- Memory retrieval latency (p50, p95, p99)
- Vector search time
- Graph query time
Monitoring Dashboard:
# /src/amplihack/memory/monitoring.py
class MemoryMetrics:
def __init__(self, metrics_backend):
self.metrics = metrics_backend
async def track_memory_usage(self, memory_id: str,
agent_id: str,
retrieval_time_ms: float,
relevance_score: float):
"""Track memory retrieval and usage."""
self.metrics.increment("memory.retrievals")
self.metrics.histogram("memory.retrieval_time", retrieval_time_ms)
self.metrics.gauge("memory.relevance_score", relevance_score)
async def track_quality_updates(self, memory_id: str,
old_quality: float,
new_quality: float):
"""Track quality score changes."""
delta = new_quality - old_quality
self.metrics.histogram("memory.quality_delta", delta)
self.metrics.gauge(f"memory.quality.{memory_id}", new_quality)
async def get_dashboard_data(self) -> Dict:
"""Get data for monitoring dashboard."""
return {
"total_memories": await self.count_memories(),
"avg_quality_by_type": await self.avg_quality_by_agent_type(),
"recent_conflicts": await self.recent_conflicts(),
"top_memories": await self.top_memories_by_usage(),
"quality_trends": await self.quality_trend_last_30_days()
}
9. Summary and Key Takeaways¶
Core Principles¶
-
Temporal Awareness is Critical: Every memory must have temporal metadata for validity and deprecation.
-
Quality Over Quantity: Better to have 100 high-quality, well-validated memories than 10,000 unvetted ones.
-
Context is King: Memories without context fingerprints lead to misapplication and pollution.
-
Explicit Conflict Management: Contradictions are inevitable; detect and resolve them systematically.
-
Hybrid Architecture Works Best: Combine vector (semantic), graph (relationships), and key-value (speed) stores.
Decision Framework for Amplihack¶
Should two agents share memory?
Same core responsibility?
├── Yes → Share general patterns
│ └── Different domains?
│ ├── Yes → Two-level sharing (general + specialized)
│ └── No → Single shared memory pool
└── No → Separate memory, but allow cross-references
What should be shared?
✅ Share:
- Procedural knowledge (how to do X)
- Validated patterns (tried and true approaches)
- Context-aware solutions
- Failure patterns (anti-patterns)
- Decision frameworks
❌ Don't Share:
- Session-specific state
- Unvalidated hypotheses
- Low-confidence experiments
- Project-specific quirks (unless explicitly scoped)
When to deprecate memory?
Deprecate if:
- Not used in 6 months AND quality < 0.5
- Superseded by better approach (quality gap > 0.15)
- Multiple recent failures (success rate < 0.6)
- Explicitly marked obsolete by multiple agents
Archive (not delete):
- Keep for historical analysis
- Maintain reference in graph
- Available for "memory archaeology"
Implementation Priority¶
High Priority (Must Have):
- Temporal graph schema with validity windows
- Quality scoring system with decay
- Context fingerprinting for appropriate application
- Basic conflict detection (temporal, quality-based)
Medium Priority (Should Have): 5. Multi-agent debate mechanism for complex conflicts 6. Cross-project vs project-specific scoping 7. Hybrid storage (vector + graph) 8. Feedback and validation tracking
Low Priority (Nice to Have): 9. Advanced conflict resolution (ML-based) 10. Cross-agent-type learning references 11. Memory visualization dashboard 12. Automated memory curation
Success Metrics¶
After 3 Months:
- 80% of agent invocations use relevant shared memory
- Average memory quality score > 0.75
- <5% conflicts require manual resolution
- Memory retrieval time < 100ms (p95)
After 6 Months:
- Measurable improvement in agent decision quality
- 50%+ reduction in repeated mistakes
- Agents proactively suggest relevant patterns
- 90%+ user satisfaction with agent recommendations
After 12 Months:
- Self-improving agent system (quality increases over time)
- Comprehensive knowledge base covering 80% of common scenarios
- Cross-project learning demonstrably effective
- Memory system becomes competitive advantage
10. References and Further Reading¶
Research Papers¶
- Zep: A Temporal Knowledge Graph Architecture for Agent Memory (2025)
- https://arxiv.org/abs/2501.13956
-
Key concepts: Temporal graphs, invalidation, Graphiti framework
-
Collaborative Memory: Multi-User Memory Sharing in LLM Agents (2025)
- https://arxiv.org/html/2505.18279v1
-
Key concepts: Bipartite graphs, access control, shared vs private memory
-
Multi-Agent Collaboration Mechanisms: A Survey of LLMs (2025)
- https://arxiv.org/html/2501.06322v1
-
Key concepts: Agent coordination, communication patterns
-
A Taxonomy of Hierarchical Multi-Agent Systems (2025)
- https://arxiv.org/html/2508.12683
- Key concepts: Five-axis taxonomy, role delegation
Tools and Frameworks¶
- Graphiti (Neo4j + Zep)
- https://github.com/getzep/graphiti
-
Temporal knowledge graphs for agents
-
Mem0
- https://mem0.ai
-
Intelligent memory consolidation, hybrid storage
-
LangGraph + MongoDB
- https://www.mongodb.com/company/blog/product-release-announcements/powering-long-term-memory-for-agents-langgraph
-
Multi-session memory for agents
-
CrewAI
- https://www.crewai.com
- Multi-agent orchestration with shared memory
Best Practices¶
- Why Multi-Agent Systems Need Memory Engineering (MongoDB Blog)
-
https://www.mongodb.com/company/blog/technical/why-multi-agent-systems-need-memory-engineering
-
Stop Using RAG for Agent Memory (Zep Blog)
- https://blog.getzep.com/stop-using-rag-for-agent-memory/
-
Why RAG alone is insufficient for agent memory
-
AI Agent Memory Systems (FalkorDB)
- https://www.falkordb.com/ai-agents-memory-systems/
- Graph databases for agent memory
Related Amplihack Documentation¶
.claude/context/PHILOSOPHY.md- Core design principles.claude/context/PATTERNS.md- Existing pattern catalog.claude/agents/README.md- Agent system overview.claude/agents/amplihack/core/*.md- Core agent definitions
Appendix A: Graph Query Examples¶
Common Queries for Agent Memory¶
-- 1. Get all high-quality memories for architect agents
MATCH (at:AgentType {type_id: "architect"})-[:OWNS_MEMORY]->(m:Memory)
WHERE m.quality_score > 0.8
AND (m.valid_until IS NULL OR m.valid_until > datetime())
RETURN m.memory_id, m.pattern, m.quality_score
ORDER BY m.quality_score DESC
LIMIT 20
-- 2. Find memories validated by multiple agents (consensus)
MATCH (m:Memory)<-[v:VALIDATED]-(ai:AgentInstance)
WITH m, count(ai) as validator_count, avg(v.feedback_score) as avg_feedback
WHERE validator_count >= 3 AND avg_feedback > 0.8
RETURN m.memory_id, m.pattern, validator_count, avg_feedback
ORDER BY validator_count DESC, avg_feedback DESC
-- 3. Detect potential conflicts (similar content, different recommendations)
MATCH (m1:Memory), (m2:Memory)
WHERE m1.memory_id < m2.memory_id
AND m1.category = m2.category
AND m1.agent_type = m2.agent_type
AND NOT (m1)-[:SUPERSEDES|SUPERSEDED_BY]-(m2)
// Assume semantic similarity computed and stored
WITH m1, m2
WHERE m1.semantic_similarity_to_m2 > 0.8
AND m1.recommendation != m2.recommendation
RETURN m1.memory_id, m2.memory_id,
m1.pattern, m2.pattern,
m1.quality_score, m2.quality_score
-- 4. Get memory usage history for an agent instance
MATCH (ai:AgentInstance {instance_id: $agent_id})-[u:USED]->(m:Memory)
RETURN m.memory_id, m.pattern, u.used_at, u.outcome
ORDER BY u.used_at DESC
LIMIT 50
-- 5. Find memories needing re-validation (old, not recently validated)
MATCH (m:Memory)
WHERE m.last_validated < datetime() - duration('P180D') // 180 days
AND m.quality_score > 0.6 // Still relatively high quality
AND (m.valid_until IS NULL OR m.valid_until > datetime())
RETURN m.memory_id, m.pattern, m.last_validated, m.validation_count
ORDER BY m.last_validated ASC
-- 6. Get memory evolution chain (supersession history)
MATCH path = (m_current:Memory)-[:SUPERSEDES*]->(m_old:Memory)
WHERE m_current.memory_id = $memory_id
RETURN path,
[n IN nodes(path) | {id: n.memory_id, pattern: n.pattern,
created: n.created_at, quality: n.quality_score}]
-- 7. Find context-specific memories for current task
MATCH (m:Memory)-[:APPLIES_IN]->(ctx:Context)
WHERE ctx.project_type IN $project_types
AND ctx.scale = $scale
AND m.quality_score > 0.7
AND m.agent_type = $agent_type
RETURN m
ORDER BY m.quality_score DESC
LIMIT 10
-- 8. Identify underutilized high-quality memories
MATCH (m:Memory)
WHERE m.quality_score > 0.85
AND m.application_count < 5
AND m.created_at < datetime() - duration('P90D')
RETURN m.memory_id, m.pattern, m.quality_score, m.application_count
ORDER BY m.quality_score DESC
-- 9. Get agent reputation scores
MATCH (ai:AgentInstance)-[c:CONTRIBUTED]->(m:Memory)
WITH ai, count(m) as total_contributions, avg(m.quality_score) as avg_quality
MATCH (ai)-[v:VALIDATED]->(m2:Memory)
WITH ai, total_contributions, avg_quality,
count(v) as validations, avg(v.feedback_score) as avg_feedback
RETURN ai.instance_id, ai.agent_type,
total_contributions, avg_quality, validations, avg_feedback
ORDER BY avg_quality DESC, total_contributions DESC
-- 10. Find memory gaps (common queries with low-quality results)
// This requires query logging - example conceptual query
MATCH (q:Query)-[:RETRIEVED]->(m:Memory)
WITH q, avg(m.quality_score) as avg_retrieved_quality, count(m) as result_count
WHERE avg_retrieved_quality < 0.6 OR result_count < 3
RETURN q.query_text, q.agent_type, avg_retrieved_quality, result_count, q.frequency
ORDER BY q.frequency DESC
LIMIT 20
Appendix B: Memory JSON Schema¶
Complete Memory Fragment Schema¶
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "AgentMemory",
"type": "object",
"required": ["memory_id", "agent_type", "type", "category", "content", "quality_score"],
"properties": {
"memory_id": {
"type": "string",
"description": "Unique identifier for this memory",
"pattern": "^[a-z]+_mem_[0-9]+$"
},
"agent_type": {
"type": "string",
"enum": [
"architect",
"builder",
"reviewer",
"tester",
"optimizer",
"security",
"database",
"api_designer",
"integration",
"analyzer"
],
"description": "Type of agent this memory belongs to"
},
"type": {
"type": "string",
"enum": ["procedural", "declarative", "meta", "anti_pattern"],
"description": "Category of knowledge"
},
"category": {
"type": "string",
"description": "Specific category (design_pattern, error_handling, etc.)",
"examples": ["design_pattern", "error_handling", "performance", "testing"]
},
"content": {
"type": "object",
"required": ["pattern", "reasoning"],
"properties": {
"pattern": {
"type": "string",
"description": "The actual pattern or knowledge"
},
"situation": {
"type": "string",
"description": "When to apply this pattern"
},
"approach": {
"type": "string",
"description": "How to apply this pattern"
},
"reasoning": {
"type": "string",
"description": "Why this pattern works"
},
"alternatives_considered": {
"type": "array",
"items": { "type": "string" },
"description": "Other approaches that were considered"
},
"when_not_to_use": {
"type": "string",
"description": "Conditions where this pattern doesn't apply"
}
}
},
"quality_score": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0,
"description": "Overall quality score (0-1)"
},
"quality_breakdown": {
"type": "object",
"properties": {
"confidence": { "type": "number", "minimum": 0, "maximum": 1 },
"validation": { "type": "number", "minimum": 0, "maximum": 1 },
"recency": { "type": "number", "minimum": 0, "maximum": 1 },
"consensus": { "type": "number", "minimum": 0, "maximum": 1 },
"context_specificity": { "type": "number", "minimum": 0, "maximum": 1 },
"impact": { "type": "number", "minimum": 0, "maximum": 1 }
}
},
"temporal_metadata": {
"type": "object",
"required": ["created_at", "last_validated"],
"properties": {
"created_at": { "type": "string", "format": "date-time" },
"last_validated": { "type": "string", "format": "date-time" },
"valid_from": { "type": "string", "format": "date-time" },
"valid_until": { "type": "string", "format": "date-time", "nullable": true },
"deprecation_date": { "type": "string", "format": "date-time", "nullable": true }
}
},
"context_fingerprint": {
"type": "object",
"properties": {
"project_type": { "type": "array", "items": { "type": "string" } },
"scale": { "type": "string", "enum": ["small", "medium", "large", "xlarge"] },
"tech_stack": { "type": "array", "items": { "type": "string" } },
"constraints": { "type": "array", "items": { "type": "string" } },
"team_size": {
"type": "string",
"enum": ["solo", "small_team", "medium_team", "large_team"]
}
}
},
"usage_statistics": {
"type": "object",
"properties": {
"application_count": { "type": "integer", "minimum": 0 },
"success_rate": { "type": "number", "minimum": 0, "maximum": 1 },
"validation_count": { "type": "integer", "minimum": 0 },
"last_used": { "type": "string", "format": "date-time" }
}
},
"contributor_agents": {
"type": "array",
"items": { "type": "string" },
"description": "Agent instance IDs that contributed to this memory"
},
"supersedes": {
"type": "array",
"items": { "type": "string" },
"description": "Memory IDs that this memory supersedes"
},
"superseded_by": {
"type": "string",
"nullable": true,
"description": "Memory ID that supersedes this one"
},
"conflicts_with": {
"type": "array",
"items": {
"type": "object",
"properties": {
"memory_id": { "type": "string" },
"conflict_type": { "type": "string", "enum": ["temporal", "contextual", "direct"] },
"resolution_status": { "type": "string", "enum": ["unresolved", "resolved", "debate"] }
}
}
},
"references": {
"type": "array",
"items": {
"type": "object",
"properties": {
"agent_type": { "type": "string" },
"memory_id": { "type": "string" },
"relationship": { "type": "string", "enum": ["supports", "contradicts", "extends"] }
}
},
"description": "References to memories from other agent types"
},
"scope": {
"type": "object",
"properties": {
"scope_type": {
"type": "string",
"enum": ["universal", "project_specific", "domain_specific"]
},
"project_id": { "type": "string", "nullable": true },
"domain": { "type": "string", "nullable": true }
}
},
"tags": {
"type": "array",
"items": { "type": "string" },
"description": "Searchable tags for categorization"
}
}
}
Document Version: 1.0 Last Updated: 2025-11-02 Author: Claude (Patterns Agent) Review Status: Draft for Amplihack Team Review