Memory-Enabled Agents Architecture¶

System design and technical deep-dive for memory-enabled goal-seeking agents.

Overview¶

Memory-enabled agents extend amplihack's goal-seeking agents with persistent memory capabilities. This document describes the architecture, component interactions, data flow, and design decisions.

Architecture Diagram¶

┌────────────────────────────────────────────────────────────────┐
│                         User/CLI                               │
└───────────┬────────────────────────────────────────────────────┘
            │
            │ amplihack goal-agent run my-agent --target ./code
            │
            ▼
┌────────────────────────────────────────────────────────────────┐
│                   Goal Agent Executor                          │
│  - Task parsing and initialization                             │
│  - Multi-turn autonomous iteration                             │
│  - Success criteria evaluation                                 │
└───────────┬────────────────────────────────────────────────────┘
            │
            │ execute_task(description, target)
            │
            ▼
┌────────────────────────────────────────────────────────────────┐
│              Memory Integration Layer                          │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │  BEFORE Execution:                                       │ │
│  │  1. Load relevant past experiences                       │ │
│  │  2. Filter by similarity and confidence                  │ │
│  │  3. Prepare learned patterns for application             │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │  DURING Execution:                                       │ │
│  │  1. Apply learned patterns to decisions                  │ │
│  │  2. Track new discoveries                                │ │
│  │  3. Store intermediate experiences                       │ │
│  └──────────────────────────────────────────────────────────┘ │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐ │
│  │  AFTER Execution:                                        │ │
│  │  1. Store success/failure outcome                        │ │
│  │  2. Recognize new patterns                               │ │
│  │  3. Update confidence scores                             │ │
│  └──────────────────────────────────────────────────────────┘ │
└───────────┬────────────────────────────────────────────────────┘
            │
            │ MemoryConnector API
            │
            ▼
┌────────────────────────────────────────────────────────────────┐
│           amplihack-memory-lib (Standalone Package)            │
│                                                                 │
│  ┌─────────────────────────┐  ┌──────────────────────────────┐│
│  │   MemoryConnector       │  │   ExperienceStore            ││
│  │  - store_experience()   │  │  - add()                     ││
│  │  - retrieve_relevant()  │  │  - search()                  ││
│  │  - get_statistics()     │  │  - auto_compress()           ││
│  └────────┬────────────────┘  └──────────┬───────────────────┘│
│           │                               │                     │
│           └───────────┬───────────────────┘                     │
│                       │                                         │
│                       ▼                                         │
│           ┌────────────────────────────┐                        │
│           │   SQLite Storage Backend   │                        │
│           │  - experiences.db          │                        │
│           │  - Full-text search index  │                        │
│           │  - Compression support     │                        │
│           └────────────────────────────┘                        │
│                                                                 │
│  Storage: ~/.amplihack/memory/{agent_name}/                    │
└───────────┬────────────────────────────────────────────────────┘
            │
            │ Validation interface
            │
            ▼
┌────────────────────────────────────────────────────────────────┐
│               gadugi-agentic-test Integration                  │
│  - Test-driven learning validation                             │
│  - Evidence collection for experiences                         │
│  - Success criteria verification                               │
│  - Learning behavior regression tests                          │
└────────────────────────────────────────────────────────────────┘

Component Responsibilities¶

1. Goal Agent Executor¶

Purpose: Autonomous task execution with multi-turn iteration

Responsibilities:

Parse user's objective and constraints
Iterate until objective achieved or max iterations
Evaluate success criteria
Coordinate with memory layer

State: Stateless (no persistence between runs)

Key Methods:

class GoalAgent:
    async def execute_task(self, description: str, target: Path) -> Result
    async def iterate_until_complete(self) -> bool
    def evaluate_success_criteria(self) -> bool

2. Memory Integration Layer¶

Purpose: Glue layer connecting agent logic to memory storage

Responsibilities:

Retrieve relevant experiences before execution
Filter experiences by confidence and similarity
Store new experiences during/after execution
Recognize patterns from multiple experiences
Update confidence scores based on outcomes

State: Per-execution context (not persisted)

Key Methods:

class MemoryIntegration:
    def load_relevant_experiences(self, context: str) -> List[Experience]
    def apply_learned_patterns(self, patterns: List[Experience]) -> None
    def store_new_experience(self, exp: Experience) -> str
    def recognize_patterns(self, experiences: List[Experience]) -> List[Experience]

3. amplihack-memory-lib Package¶

Purpose: Standalone persistent memory storage and retrieval

Responsibilities:

Store experiences in SQLite database
Retrieve experiences by filters (type, confidence, date)
Semantic search for relevant experiences
Memory management (compression, cleanup)
Statistics and metrics

State: Persistent (SQLite database on disk)

Key Classes:

MemoryConnector: Main API for memory operations
ExperienceStore: High-level interface with auto-management
Experience: Data class for experience objects
ExperienceType: Enum (SUCCESS, FAILURE, PATTERN, INSIGHT)

4. gadugi-agentic-test Validation¶

Purpose: Validate that agents actually learn and improve

Responsibilities:

Test experience storage after execution
Verify runtime improvement across runs
Validate pattern recognition accuracy
Check confidence score progression
Collect evidence of learning behavior

State: Test reports and evidence files

Key Tests:

def test_agent_stores_experiences_after_run()
def test_agent_improves_with_memory()
def test_pattern_recognition_across_runs()
def test_confidence_increases_with_validation()

Data Flow¶

Execution Flow (Single Run)¶

1. User invokes agent
   └─> amplihack goal-agent run doc-analyzer --target ./docs

2. Goal Agent Executor initializes
   ├─> Loads agent definition
   ├─> Parses objective and constraints
   └─> Initializes Memory Integration Layer

3. Memory Integration Layer loads context
   ├─> Calls MemoryConnector.retrieve_relevant(context)
   ├─> Filters by confidence >= min_confidence_to_apply
   ├─> Separates PATTERN and INSIGHT experiences
   └─> Returns relevant_experiences[]

4. Goal Agent executes task
   ├─> Applies learned patterns first
   │   ├─> For each PATTERN with high confidence:
   │   │   ├─> Apply pattern to current task
   │   │   └─> Store SUCCESS experience if pattern works
   │   └─> Track patterns_applied count
   │
   ├─> Performs main analysis
   │   ├─> Analyzes files/code
   │   ├─> Discovers new issues/patterns
   │   └─> Tracks discoveries[]
   │
   └─> Recognizes patterns in discoveries
       ├─> Group similar discoveries
       ├─> Count occurrences
       └─> If count >= pattern_recognition_threshold:
           └─> Create new PATTERN experience

5. Memory Integration Layer stores experiences
   ├─> Store SUCCESS/FAILURE outcome
   ├─> Store new PATTERN experiences
   ├─> Store INSIGHT experiences (if any)
   └─> Update metadata (runtime, counts, etc.)

6. Results returned to user
   └─> Summary: X issues found, Y patterns applied, Z new patterns

Multi-Run Learning Flow¶

Run 1:
  Load: 0 experiences
  Apply: 0 patterns
  Discover: 5 issues (no patterns yet)
  Store: 5 SUCCESS experiences
  Result: Runtime 45s, found 5 issues

Run 2:
  Load: 5 experiences (relevant to current task)
  Apply: 0 patterns (not recognized yet - only 1 occurrence each)
  Discover: 7 issues (3 similar to Run 1)
  Recognize: 3 patterns (occurred in both Run 1 and Run 2)
  Store: 7 SUCCESS + 3 PATTERN experiences
  Result: Runtime 40s, found 7 issues

Run 3:
  Load: 15 experiences (5 + 7 + 3 patterns)
  Apply: 3 patterns (confidence >= 0.7)
    ├─> Pattern 1: Found 2 matches immediately (no analysis needed)
    ├─> Pattern 2: Found 1 match immediately
    └─> Pattern 3: Found 0 matches (pattern doesn't apply here)
  Discover: 4 new issues
  Recognize: 1 new pattern
  Store: 4 SUCCESS + 1 PATTERN + Update confidence on Pattern 1 & 2
  Result: Runtime 25s (44% faster), found 7 issues (3 from patterns + 4 new)

Run 10:
  Load: 80 experiences (patterns, successes, insights)
  Apply: 12 patterns with high confidence
    └─> Immediately check for all known patterns (5s)
  Discover: 1 new issue
  Store: 1 SUCCESS + Update confidence on 12 patterns
  Result: Runtime 18s (60% faster), found 13 issues (12 from patterns + 1 new)

Key Observation: Learning curve is steep initially (runs 1-5), then plateaus as agent masters the domain.

Storage Schema¶

SQLite Database Schema¶

-- experiences table
CREATE TABLE experiences (
    experience_id TEXT PRIMARY KEY,
    agent_name TEXT NOT NULL,
    experience_type TEXT NOT NULL,  -- 'success', 'failure', 'pattern', 'insight'
    context TEXT NOT NULL,
    outcome TEXT NOT NULL,
    confidence REAL NOT NULL,
    timestamp INTEGER NOT NULL,     -- Unix timestamp
    metadata_json TEXT,             -- JSON blob
    tags_json TEXT,                 -- JSON array
    created_at INTEGER NOT NULL,
    updated_at INTEGER NOT NULL
);

-- Indexes for fast retrieval
CREATE INDEX idx_agent_name ON experiences(agent_name);
CREATE INDEX idx_experience_type ON experiences(experience_type);
CREATE INDEX idx_timestamp ON experiences(timestamp);
CREATE INDEX idx_confidence ON experiences(confidence);
CREATE INDEX idx_agent_type ON experiences(agent_name, experience_type);

-- Full-text search index
CREATE VIRTUAL TABLE experiences_fts USING fts5(
    experience_id,
    context,
    outcome,
    content=experiences
);

Experience Data Model¶

@dataclass
class Experience:
    """Represents a single learning experience."""

    experience_id: str              # Unique ID (auto-generated)
    experience_type: ExperienceType # SUCCESS, FAILURE, PATTERN, INSIGHT
    context: str                    # What was happening (max 500 chars)
    outcome: str                    # What resulted (max 1000 chars)
    confidence: float               # 0.0 - 1.0
    timestamp: datetime             # When this occurred
    metadata: Dict[str, Any]        # Additional data
    tags: List[str]                 # Searchable tags

Experience ID Format: exp_YYYYMMDD_HHMMSS_{hash}

Example: exp_20260214_102315_a7f3c9

Metadata Structure¶

Metadata is flexible JSON, but common fields include:

{
  "runtime_seconds": 45.2,
  "files_processed": 47,
  "issues_found": 5,
  "patterns_applied": 3,
  "patterns_discovered": 2,
  "cache_hits": 12,
  "cache_misses": 35,
  "true_positives": 5,
  "false_positives": 1,
  "false_negatives": 0,
  "target_path": "./docs",
  "agent_version": "1.0.0"
}

Memory Management¶

Automatic Cleanup¶

Memory is managed automatically based on configuration:

# memory_config.yaml
retention:
  max_age_days: 90 # Delete experiences older than 90 days
  max_experiences: 10000 # Keep max 10,000 experiences
  delete_strategy: oldest_first

Cleanup Process:

Run periodically (on agent startup and every 100 executions)
Check if limits exceeded
If max_age_days exceeded: Delete experiences older than threshold
If max_experiences exceeded: Delete oldest experiences until under limit
Vacuum database to reclaim space

Compression¶

Old experiences are compressed to save space:

compression:
  enabled: true
  after_days: 30 # Compress experiences older than 30 days

Compression Process:

Experiences older than after_days are candidates
Metadata JSON is compressed with gzip
Original JSON replaced with compressed version
Typical compression ratio: 3:1

Access: Compressed experiences are automatically decompressed on retrieval (transparent to caller).

Semantic Relevance¶

How Relevance is Calculated¶

When retrieving relevant experiences, similarity is calculated using:

def calculate_relevance(experience: Experience, current_context: str) -> float:
    """
    Calculate relevance score (0.0 - 1.0) for an experience.

    Factors:
    1. Text similarity (TF-IDF or embedding-based)
    2. Experience type (PATTERN and INSIGHT weighted higher)
    3. Confidence (higher confidence = more relevant)
    4. Recency (newer experiences weighted higher)
    """

    # Base similarity using TF-IDF vectorizer
    base_similarity = tfidf_similarity(experience.context, current_context)

    # Type weighting
    type_weight = {
        ExperienceType.PATTERN: 1.5,
        ExperienceType.INSIGHT: 1.3,
        ExperienceType.SUCCESS: 1.0,
        ExperienceType.FAILURE: 0.8
    }[experience.experience_type]

    # Confidence boost
    confidence_factor = experience.confidence

    # Recency boost (decay over 90 days)
    age_days = (datetime.now() - experience.timestamp).days
    recency_factor = max(0.5, 1.0 - (age_days / 90))

    # Combined score
    relevance = base_similarity * type_weight * confidence_factor * recency_factor

    return min(1.0, relevance)  # Cap at 1.0

Example Relevance Calculation¶

# Current task
current_context = "Analyze Python documentation for missing examples"

# Experience 1
exp1 = Experience(
    experience_type=ExperienceType.PATTERN,
    context="Tutorial files without code examples",
    confidence=0.92,
    timestamp=datetime(2026, 2, 10)  # 4 days ago
)

# Calculation:
# - base_similarity: 0.85 (high text similarity)
# - type_weight: 1.5 (PATTERN)
# - confidence_factor: 0.92
# - recency_factor: 0.96 (recent)
# = 0.85 * 1.5 * 0.92 * 0.96 = 1.13 → capped to 1.0

relevance1 = 1.0  # Highly relevant

# Experience 2
exp2 = Experience(
    experience_type=ExperienceType.SUCCESS,
    context="Fixed security vulnerability in authentication",
    confidence=0.88,
    timestamp=datetime(2026, 1, 15)  # 30 days ago
)

# Calculation:
# - base_similarity: 0.15 (low text similarity)
# - type_weight: 1.0 (SUCCESS)
# - confidence_factor: 0.88
# - recency_factor: 0.67 (older)
# = 0.15 * 1.0 * 0.88 * 0.67 = 0.09

relevance2 = 0.09  # Not relevant

Pattern Recognition Algorithm¶

Pattern Detection¶

Patterns are recognized when a situation occurs repeatedly:

def detect_patterns(
    current_discoveries: List[Discovery],
    known_patterns: List[Experience],
    threshold: int = 3
) -> List[Experience]:
    """
    Detect new patterns from current discoveries.

    Args:
        current_discoveries: New discoveries from this run
        known_patterns: Existing known patterns
        threshold: Minimum occurrences to recognize pattern

    Returns:
        List of new PATTERN experiences
    """

    # Track pattern candidates
    pattern_candidates = defaultdict(lambda: {
        "count": 0,
        "examples": [],
        "confidence": 0.0
    })

    # Extract pattern keys from discoveries
    for discovery in current_discoveries:
        pattern_key = extract_pattern_key(discovery)

        pattern_candidates[pattern_key]["count"] += 1
        pattern_candidates[pattern_key]["examples"].append(discovery)

    # Check if any candidates exceed threshold
    new_patterns = []

    for pattern_key, data in pattern_candidates.items():
        if data["count"] >= threshold:
            # Check if already known
            is_known = any(
                p.context == pattern_key
                for p in known_patterns
            )

            if not is_known:
                # Create new pattern
                confidence = min(0.5 + (data["count"] * 0.1), 0.95)

                pattern = Experience(
                    experience_type=ExperienceType.PATTERN,
                    context=pattern_key,
                    outcome=describe_pattern(data),
                    confidence=confidence,
                    timestamp=datetime.now(),
                    metadata={
                        "occurrences": data["count"],
                        "examples": data["examples"][:5]
                    }
                )

                new_patterns.append(pattern)

    return new_patterns

Pattern Key Extraction¶

Pattern keys are domain-specific identifiers:

# Example: Documentation analyzer
def extract_pattern_key(discovery: Discovery) -> str:
    """Extract pattern identifier from discovery."""

    if discovery.type == "missing_example":
        return "tutorial_missing_example"

    elif discovery.type == "broken_link":
        if "external" in discovery.url:
            return "external_link_broken"
        else:
            return "internal_link_broken"

    elif discovery.type == "unclear_heading":
        return "heading_too_generic"

    else:
        return f"unknown_{discovery.type}"

Security Considerations¶

Data Privacy¶

What is stored:

Natural language context descriptions
Natural language outcomes
Metadata (metrics, file paths, timestamps)
NO source code by default
NO credentials or secrets

Storage location: ~/.amplihack/memory/{agent_name}/

Permissions: Owner read/write only (chmod 600)

Sensitive Data Handling¶

Agents should NOT store sensitive data in experiences:

# ❌ BAD: Storing actual credential
experience = Experience(
    context="Found password: supersecret123",
    outcome="Should be in env var"
)

# ✓ GOOD: Storing reference only
experience = Experience(
    context="File auth.py line 42: Hardcoded credential found",
    outcome="Flagged for removal",
    metadata={"file": "auth.py", "line": 42}
)

Memory Isolation¶

Each agent has isolated memory:

Agent A cannot access Agent B's experiences
Memory is scoped by agent_name
No cross-agent memory sharing (by design)

Rationale: Prevents interference and maintains clear boundaries.

Performance Characteristics¶

Storage Performance¶

Operation	Complexity	Time (1K experiences)	Time (10K experiences)
`store_experience()`	O(1)	< 5ms	< 5ms
`retrieve_experiences()`	O(n)	< 20ms	< 50ms
`retrieve_relevant()`	O(n log k)	< 50ms	< 80ms
`get_statistics()`	O(1)	< 2ms	< 5ms

Memory Usage¶

Component	Per-Agent Storage	Notes
SQLite database	2-10 MB per 1000 experiences	Depends on metadata size
Full-text index	+20% overhead	For semantic search
Compressed experiences	-70% (3:1 ratio)	After compression

Typical agent memory: 5-50 MB after 100 runs

Runtime Impact¶

Phase	Overhead	Notes
Load experiences	50-100ms	One-time at start
Apply patterns	10-30ms per pattern	Linear with pattern count
Store experiences	5-10ms per experience	Batched at end
Total overhead	5-10%	Offset by performance gains

Net effect: Despite overhead, agents run 40-60% faster after learning due to pattern application.

Design Decisions¶

Why Standalone Library?¶

Decision: Package amplihack-memory-lib separately from amplihack

Rationale:

Reusability: Can be used in non-amplihack projects
Versioning: Independent release cycle
Dependencies: Minimal deps (Python 3.10+, SQLite only)
Testing: Easier to test in isolation

Why SQLite Instead of External DB?¶

Decision: Use embedded SQLite database

Rationale:

Zero setup: No external service required
File-based: Easy to backup, copy, version control
Fast enough: Adequate performance for agent use case
Portable: Works on any platform

Trade-offs:

❌ No multi-agent concurrent writes (not needed - agents run serially)
✓ Simple deployment (no database server)
✓ Low resource usage (no background process)

Why Four Experience Types?¶

Decision: SUCCESS, FAILURE, PATTERN, INSIGHT (not more, not fewer)

Rationale:

SUCCESS/FAILURE: Outcome tracking (what worked/didn't work)
PATTERN: Reusable knowledge (applies to new situations)
INSIGHT: High-level principles (guides decision-making)

Alternatives considered:

Single type "EXPERIENCE": ❌ Too generic, harder to filter
Many types (WARNING, INFO, DEBUG, etc.): ❌ Overcomplicates, unclear semantics

Why Confidence Scores?¶

Decision: Every experience has confidence score (0.0-1.0)

Rationale:

Filtering: Only apply high-confidence patterns
Learning: Confidence increases with validation
Uncertainty: Explicit representation of agent's certainty

Usage:

# Only apply patterns with confidence >= 0.7
if pattern.confidence >= 0.7:
    apply_pattern(pattern)

Future Enhancements¶

Planned Improvements¶

Cross-Agent Knowledge Transfer
Export experiences from Agent A
Import into Agent B
Use case: Transfer learnings to new agent
Advanced Similarity Metrics
Embedding-based similarity (currently TF-IDF)
Use sentence transformers for better semantic matching
Configurable similarity functions
Memory Visualization
Web UI for browsing experiences
Graph visualization of pattern relationships
Timeline view of learning progression
Federated Learning
Agents share knowledge without sharing data
Privacy-preserving knowledge transfer
Community pattern libraries

Memory-Enabled Agents Architecture¶

Overview¶

Architecture Diagram¶

Component Responsibilities¶

1. Goal Agent Executor¶

2. Memory Integration Layer¶

3. amplihack-memory-lib Package¶

4. gadugi-agentic-test Validation¶

Data Flow¶

Execution Flow (Single Run)¶

Multi-Run Learning Flow¶

Storage Schema¶

SQLite Database Schema¶

Experience Data Model¶

Metadata Structure¶

Memory Management¶

Automatic Cleanup¶

Compression¶

Semantic Relevance¶

How Relevance is Calculated¶

Example Relevance Calculation¶

Pattern Recognition Algorithm¶

Pattern Detection¶

Pattern Key Extraction¶

Security Considerations¶

Data Privacy¶

Sensitive Data Handling¶

Memory Isolation¶

Performance Characteristics¶

Storage Performance¶

Memory Usage¶

Runtime Impact¶

Design Decisions¶

Why Standalone Library?¶

Why SQLite Instead of External DB?¶

Why Four Experience Types?¶

Why Confidence Scores?¶

Future Enhancements¶

Planned Improvements¶

See Also¶