Understanding the LearningAgent Module Architecture¶
The refactored LearningAgent keeps the same caller-facing behavior while replacing the old monolith with focused internal modules.
From the outside, generated agents, eval harnesses, and direct imports continue to use the same LearningAgent methods. Internally, the agent is now organized around stable ownership boundaries: ingestion, retrieval, intent detection, temporal reasoning, code synthesis, knowledge utilities, and answer synthesis.
Why the split exists¶
The refactor solves four maintenance problems in the original single-file implementation:
- Too many responsibilities in one file made reviews noisy.
- Retrieval, temporal reasoning, and synthesis logic were tightly interleaved.
- Tests were forced into one large compatibility bucket.
- Small changes risked accidental regressions in unrelated paths.
The module split keeps the behavior intact while making it obvious where new logic belongs.
Compatibility model¶
The compatibility rules are strict:
src/amplihack/agents/goal_seeking/learning_agent.pyremains the primary import path.LearningAgentremains directly importable for existing callers.GoalSeekingAgentcontinues to delegate learning and answering work toLearningAgent.- The public methods stay unchanged:
learn_from_contentanswer_questionanswer_question_agenticget_memory_statsflush_memoryclosesrc/amplihack/agents/goal_seeking/__init__.pycontinues to importLearningAgentfor backward compatibility without promoting it into__all__.
The eight-module layout¶
The refactor centers on eight named modules. These are the contributor-facing ownership boundaries.
| Module | Responsibility | Typical reasons to edit it |
|---|---|---|
learning_agent.py | Thin facade, construction, action registration, lifecycle, retry helpers | Constructor behavior, shared state, public method delegation |
learning_ingestion.py | Content ingestion, fact extraction, batching, storage | Learning flow, source labels, summary concept maps |
answer_synthesizer.py | LLM answer synthesis and completeness evaluation | Prompt assembly, final answer wording, gap-filling refinement |
retrieval_strategies.py | Retrieval planning and retrieval implementations | Entity lookup, concept lookup, aggregation retrieval, fallbacks |
intent_detector.py | Query intent classification | Intent labels, routing metadata, math/temporal flags |
temporal_reasoning.py | Temporal state tracking and transition chains | Change-over-time questions, direct temporal lookups |
code_synthesis.py | LLM-driven code generation for hard temporal calculations | Generated Python snippets and temporal index computation |
knowledge_utils.py | Shared helpers for arithmetic, entity handling, fact validation | Math precomputation, knowledge explanations, fact verification |
Private helper files may exist when needed to keep the main ownership modules reviewable. Those helpers support the eight modules above; they do not replace them as contributor entry points.
Shared state stays in the facade¶
The refactor deliberately avoids a new runtime-state object. learning_agent.py remains the one place that owns process-wide state and construction-time wiring.
The shared state that stays on LearningAgent includes:
self.memoryself.modelself.agent_nameself.use_hierarchicalself.loopself.executorself._thread_localself._pre_snapshot_factsself.prompt_variantself._variant_system_prompt
This keeps the internal modules focused on behavior, not lifecycle.
Dependency direction¶
Imports move in one direction only:
- Leaf behavior
intent_detector.pytemporal_reasoning.pycode_synthesis.pyknowledge_utils.py- Stateful pipelines
learning_ingestion.pyretrieval_strategies.py- Orchestration
answer_synthesizer.py- Public facade
learning_agent.py
Lower layers do not import higher layers. The facade assembles everything.
How learning flows through the modules¶
learn_from_content() still looks like one operation to callers, but the work is now easier to follow:
flowchart TD
A[learn_from_content] --> B[learning_agent.py facade]
B --> C[learning_ingestion.prepare_fact_batch]
C --> D[temporal metadata detection]
C --> E[LLM fact extraction]
C --> F[summary concept map generation]
B --> G[learning_ingestion.store_fact_batch]
G --> H[memory backend]
H --> I[get_memory_stats / later retrieval] Key properties:
- content truncation and source-label derivation live with ingestion
- temporal metadata detection stays near storage preparation
- batch preparation and batch storage stay together so the data contract is obvious
How answering flows through the modules¶
The read path is now explicitly staged:
flowchart TD
A[answer_question] --> B[learning_agent.py facade]
B --> C[intent_detector]
C --> D[retrieval_strategies]
D --> E[knowledge_utils math helpers]
D --> F[temporal_reasoning]
F --> G[code_synthesis]
D --> H[answer_synthesizer]
H --> I[store Q and A if enabled]
I --> J[final answer] That flow matters because each stage answers a different question:
- Intent detector: what kind of question is this?
- Retrieval strategies: what facts should we bring back?
- Knowledge utilities: do we need deterministic arithmetic first?
- Temporal reasoning: is there a chronological chain to compute?
- Code synthesis: do we need generated code for a hard temporal lookup?
- Answer synthesizer: how do we turn the facts into a final answer?
Agentic answering still builds on single-shot answering¶
answer_question_agentic() remains additive rather than divergent.
It still:
- runs the standard single-shot pipeline first
- evaluates answer completeness
- retrieves more facts only when specific gaps exist
- re-synthesizes from the original answer plus additional evidence
That means the refactor preserves the existing design rule: agentic mode should not score worse than the single-shot baseline.
Test layout after the split¶
The old test_learning_agent.py monolith is replaced by module-aligned test files:
| Test file | Primary scope |
|---|---|
tests/agents/goal_seeking/test_learning_agent_core.py | facade construction, retry helpers, lifecycle |
tests/agents/goal_seeking/test_learning_agent_ingestion.py | batch prep, storage, source labels, temporal metadata |
tests/agents/goal_seeking/test_learning_agent_retrieval.py | retrieval strategies, aggregation, fallbacks |
tests/agents/goal_seeking/test_learning_agent_temporal.py | temporal parsing, transition chains, generated temporal code |
The existing broader behavior tests remain in place:
test_math_intent.pytest_agentic_answer_mode.pytest_goal_seeking_agent.py
Maintenance guardrails¶
The refactor keeps a few hard boundaries in place:
- each primary extracted module stays small enough to review comfortably
learning_agent.pystays intentionally thin- no circular imports
- no dead imports
- public method signatures stay stable
- new helper logic goes to the owning module, not back into the facade
When to touch each layer¶
Use this rule of thumb before editing:
- change classification logic in
intent_detector.py - change time-aware interpretation in
temporal_reasoning.py - change deterministic computation in
code_synthesis.pyorknowledge_utils.py - change fact extraction or storage in
learning_ingestion.py - change which facts are retrieved in
retrieval_strategies.py - change how final answers are phrased or refined in
answer_synthesizer.py - change construction, lifecycle, or compatibility wiring in
learning_agent.py
Related reading¶
- Start with the Goal-Seeking Agents overview for the broader system.
- Use the LearningAgent module reference for the exact public API and file ownership map.
- Use How to maintain and extend the refactored LearningAgent when making changes.
- Follow the LearningAgent refactor tutorial for an end-to-end walkthrough.