Skip to content

Understanding the LearningAgent Module Architecture

The refactored LearningAgent keeps the same caller-facing behavior while replacing the old monolith with focused internal modules.

From the outside, generated agents, eval harnesses, and direct imports continue to use the same LearningAgent methods. Internally, the agent is now organized around stable ownership boundaries: ingestion, retrieval, intent detection, temporal reasoning, code synthesis, knowledge utilities, and answer synthesis.

Why the split exists

The refactor solves four maintenance problems in the original single-file implementation:

  • Too many responsibilities in one file made reviews noisy.
  • Retrieval, temporal reasoning, and synthesis logic were tightly interleaved.
  • Tests were forced into one large compatibility bucket.
  • Small changes risked accidental regressions in unrelated paths.

The module split keeps the behavior intact while making it obvious where new logic belongs.

Compatibility model

The compatibility rules are strict:

  • src/amplihack/agents/goal_seeking/learning_agent.py remains the primary import path.
  • LearningAgent remains directly importable for existing callers.
  • GoalSeekingAgent continues to delegate learning and answering work to LearningAgent.
  • The public methods stay unchanged:
  • learn_from_content
  • answer_question
  • answer_question_agentic
  • get_memory_stats
  • flush_memory
  • close
  • src/amplihack/agents/goal_seeking/__init__.py continues to import LearningAgent for backward compatibility without promoting it into __all__.

The eight-module layout

The refactor centers on eight named modules. These are the contributor-facing ownership boundaries.

Module Responsibility Typical reasons to edit it
learning_agent.py Thin facade, construction, action registration, lifecycle, retry helpers Constructor behavior, shared state, public method delegation
learning_ingestion.py Content ingestion, fact extraction, batching, storage Learning flow, source labels, summary concept maps
answer_synthesizer.py LLM answer synthesis and completeness evaluation Prompt assembly, final answer wording, gap-filling refinement
retrieval_strategies.py Retrieval planning and retrieval implementations Entity lookup, concept lookup, aggregation retrieval, fallbacks
intent_detector.py Query intent classification Intent labels, routing metadata, math/temporal flags
temporal_reasoning.py Temporal state tracking and transition chains Change-over-time questions, direct temporal lookups
code_synthesis.py LLM-driven code generation for hard temporal calculations Generated Python snippets and temporal index computation
knowledge_utils.py Shared helpers for arithmetic, entity handling, fact validation Math precomputation, knowledge explanations, fact verification

Private helper files may exist when needed to keep the main ownership modules reviewable. Those helpers support the eight modules above; they do not replace them as contributor entry points.

Shared state stays in the facade

The refactor deliberately avoids a new runtime-state object. learning_agent.py remains the one place that owns process-wide state and construction-time wiring.

The shared state that stays on LearningAgent includes:

  • self.memory
  • self.model
  • self.agent_name
  • self.use_hierarchical
  • self.loop
  • self.executor
  • self._thread_local
  • self._pre_snapshot_facts
  • self.prompt_variant
  • self._variant_system_prompt

This keeps the internal modules focused on behavior, not lifecycle.

Dependency direction

Imports move in one direction only:

  1. Leaf behavior
  2. intent_detector.py
  3. temporal_reasoning.py
  4. code_synthesis.py
  5. knowledge_utils.py
  6. Stateful pipelines
  7. learning_ingestion.py
  8. retrieval_strategies.py
  9. Orchestration
  10. answer_synthesizer.py
  11. Public facade
  12. learning_agent.py

Lower layers do not import higher layers. The facade assembles everything.

How learning flows through the modules

learn_from_content() still looks like one operation to callers, but the work is now easier to follow:

flowchart TD
    A[learn_from_content] --> B[learning_agent.py facade]
    B --> C[learning_ingestion.prepare_fact_batch]
    C --> D[temporal metadata detection]
    C --> E[LLM fact extraction]
    C --> F[summary concept map generation]
    B --> G[learning_ingestion.store_fact_batch]
    G --> H[memory backend]
    H --> I[get_memory_stats / later retrieval]

Key properties:

  • content truncation and source-label derivation live with ingestion
  • temporal metadata detection stays near storage preparation
  • batch preparation and batch storage stay together so the data contract is obvious

How answering flows through the modules

The read path is now explicitly staged:

flowchart TD
    A[answer_question] --> B[learning_agent.py facade]
    B --> C[intent_detector]
    C --> D[retrieval_strategies]
    D --> E[knowledge_utils math helpers]
    D --> F[temporal_reasoning]
    F --> G[code_synthesis]
    D --> H[answer_synthesizer]
    H --> I[store Q and A if enabled]
    I --> J[final answer]

That flow matters because each stage answers a different question:

  • Intent detector: what kind of question is this?
  • Retrieval strategies: what facts should we bring back?
  • Knowledge utilities: do we need deterministic arithmetic first?
  • Temporal reasoning: is there a chronological chain to compute?
  • Code synthesis: do we need generated code for a hard temporal lookup?
  • Answer synthesizer: how do we turn the facts into a final answer?

Agentic answering still builds on single-shot answering

answer_question_agentic() remains additive rather than divergent.

It still:

  1. runs the standard single-shot pipeline first
  2. evaluates answer completeness
  3. retrieves more facts only when specific gaps exist
  4. re-synthesizes from the original answer plus additional evidence

That means the refactor preserves the existing design rule: agentic mode should not score worse than the single-shot baseline.

Test layout after the split

The old test_learning_agent.py monolith is replaced by module-aligned test files:

Test file Primary scope
tests/agents/goal_seeking/test_learning_agent_core.py facade construction, retry helpers, lifecycle
tests/agents/goal_seeking/test_learning_agent_ingestion.py batch prep, storage, source labels, temporal metadata
tests/agents/goal_seeking/test_learning_agent_retrieval.py retrieval strategies, aggregation, fallbacks
tests/agents/goal_seeking/test_learning_agent_temporal.py temporal parsing, transition chains, generated temporal code

The existing broader behavior tests remain in place:

  • test_math_intent.py
  • test_agentic_answer_mode.py
  • test_goal_seeking_agent.py

Maintenance guardrails

The refactor keeps a few hard boundaries in place:

  • each primary extracted module stays small enough to review comfortably
  • learning_agent.py stays intentionally thin
  • no circular imports
  • no dead imports
  • public method signatures stay stable
  • new helper logic goes to the owning module, not back into the facade

When to touch each layer

Use this rule of thumb before editing:

  • change classification logic in intent_detector.py
  • change time-aware interpretation in temporal_reasoning.py
  • change deterministic computation in code_synthesis.py or knowledge_utils.py
  • change fact extraction or storage in learning_ingestion.py
  • change which facts are retrieved in retrieval_strategies.py
  • change how final answers are phrased or refined in answer_synthesizer.py
  • change construction, lifecycle, or compatibility wiring in learning_agent.py