Goal-Seeking Agents Quick Reference Fast lookup for common commands, patterns, and workflows.
Quick Start # Create a new goal-seeking agent
amplihack new "Security auditor that scans code for vulnerabilities"
# Create with specific SDK
amplihack new --sdk copilot "API documentation generator"
amplihack new --sdk claude "Meeting notes synthesizer"
amplihack new --sdk microsoft "Code review assistant"
# Enable multi-agent architecture
amplihack new --multi-agent --sdk copilot "Complex research agent"
# Enable dynamic agent spawning
amplihack new --enable-spawning --sdk claude "Adaptive learning agent"
CLI Reference Agent Generation Command Description amplihack new <goal> Generate agent with default SDK (copilot) --sdk {copilot,claude,microsoft,mini} Choose SDK backend --multi-agent Enable coordinator + memory + spawner architecture --enable-spawning Enable dynamic sub-agent spawning --domain {security,meetings,data,...} Generate domain-specific agent
Evaluation Commands Command Description python -m amplihack.eval.progressive_test_suite Run L1-L12 eval --runs 3 3-run median eval (recommended) --grader-votes 3 Multi-vote grading for stability --sdk {mini,claude,copilot,microsoft} Test specific SDK --parallel N Run N evals concurrently
Self-Improvement Loop Command Description python -m amplihack.eval.sdk_eval_loop Run improvement iterations --sdk copilot --iterations 5 5 loops on Copilot SDK python -m amplihack.eval.matrix_eval 5-way agent comparison
SDK Selection Guide SDK Best For Pros Cons Copilot GitHub workflows, code review Native GitHub integration, fast Requires GitHub account Claude Complex reasoning, research Large context, strong reasoning Higher cost Microsoft Enterprise workflows, Teams Azure integration, governance Requires Azure setup Mini Testing, prototypes, cost-sensitive Lightweight, no dependencies Limited capabilities
Evaluation Levels (L1-L12) Level Focus Pass Threshold L1 Simple Recall ≥85% L2 Multi-Source Synthesis ≥85% L3 Temporal Reasoning ≥70% L4 Procedural Application ≥80% L5 Contradiction Resolution ≥75% L6 Incremental Updates ≥85% L7 Teaching Transfer NLG ≥0.7 L8 Metacognition ≥50% L9 Causal Reasoning ≥50% L10 Counterfactual Reasoning ≥40% L11 Novel Skill Acquisition ≥50% L12 Far Transfer ≥60%
Common Patterns Generate and Evaluate # 1. Generate agent
amplihack new --sdk copilot "Code documentation agent"
# 2. Navigate to generated directory
cd code_documentation_agent/
# 3. Run 3-run median eval with multi-vote grading
python -m amplihack.eval.progressive_test_suite \
--runs 3 \
--grader-votes 3 \
--sdk copilot
Self-Improvement Loop # Run 5 improvement iterations
python -m amplihack.eval.sdk_eval_loop \
--sdk copilot \
--iterations 5 \
--output improvement_report.json
Multi-SDK Comparison # Compare all 4 SDKs
python -m amplihack.eval.matrix_eval \
--runs 3 \
--output sdk_comparison.json
Long-Horizon Memory Stress Test # 1000-turn dialogue evaluation
python -m amplihack.eval.long_horizon_memory \
--turns 1000 \
--questions 20 \
--output memory_eval.json
Agent Architecture Single-Agent (Default) ┌──────────────────────┐
│ Learning Agent │
│ │
│ - 7 Learning Tools │
│ - Memory System │
│ - Intent Classifier │
│ - Agentic Loop │
└──────────────────────┘
Multi-Agent (--multi-agent) ┌────────────────────────────────────────┐
│ Coordinator Agent │
│ (Task routing and delegation) │
└───────────┬────────────────────────────┘
│
┌───────┴────────┬──────────────┐
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌──────────┐
│ Memory │ │ Reasoning│ │ Research │
│ Agent │ │ Agent │ │ Agent │
└─────────┘ └──────────┘ └──────────┘
With Spawning (--enable-spawning) ┌────────────────────────────────────────┐
│ Coordinator Agent │
└───────────┬────────────────────────────┘
│
┌───────┴────────┬──────────────────┐
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌──────────────────┐
│ Memory │ │ Reasoning│ │ Agent Spawner │
│ Agent │ │ Agent │ │ (Dynamic) │
└─────────┘ └──────────┘ └──────────────────┘
│
┌──────────┴──────────┐
▼ ▼
┌──────────┐ ┌──────────┐
│Retrieval │ │Synthesis │
│Sub-Agent │ │Sub-Agent │
└──────────┘ └──────────┘
Tool Purpose Example learn_from_content Extract and store facts Learn from articles, docs search_memory Retrieve knowledge Find relevant facts synthesize_answer Combine facts to answer Answer complex questions calculate Safe arithmetic Compute medal totals explain_knowledge Teach concepts Explain to beginners find_knowledge_gaps Identify missing info Know what you don't know verify_fact Cross-check claims Validate contradictions
Environment Variables Variable Purpose Default ANTHROPIC_API_KEY Claude API access Required for Claude/Mini OPENAI_API_KEY OpenAI access Required for Microsoft SDK COPILOT_MODEL Copilot model selection gpt-4 CLAUDE_AGENT_MODEL Claude SDK model claude-opus-4 MICROSOFT_AGENT_MODEL Microsoft SDK model gpt-4 EVAL_MODEL Evaluation LLM claude-opus-4 GRADER_MODEL Grading LLM claude-opus-4
Troubleshooting Problem Solution Import errors for SDK Install SDK: pip install github-copilot-sdk / claude-agents / agent-framework Low eval scores Run with --runs 3 --grader-votes 3 for stability Memory retrieval failures Increase simple_retrieval_threshold in config Slow evaluation Use --parallel 4 for concurrent runs SDK agent not responding Check API keys, verify SDK installation
File Structure my_agent/
├── goal_prompt.md # Agent goal and capabilities
├── prompts/ # Markdown prompt templates
│ ├── system.md
│ ├── learning_task.md
│ └── synthesis_template.md
├── sdk_tools.json # SDK-specific tool configs
├── sub_agents/ # Multi-agent configs (if --multi-agent)
│ ├── coordinator.yaml
│ ├── memory_agent.yaml
│ └── spawner.yaml
├── tests/ # Unit tests
└── README.md # Usage guide