Writing Custom Agent Adapters¶
The AgentAdapter Interface¶
To make any agent evaluable, implement the AgentAdapter abstract base class from amplihack_eval.adapters.base:
from amplihack_eval import AgentAdapter, AgentResponse, ToolCall
class MyAgent(AgentAdapter):
def learn(self, content: str) -> None:
"""Feed content to the agent for learning/memorization."""
...
def answer(self, question: str) -> AgentResponse:
"""Ask the agent a question. Returns answer + trajectory."""
...
def reset(self) -> None:
"""Reset agent state between eval runs."""
...
def close(self) -> None:
"""Clean up resources (connections, files, etc.)."""
...
Required Methods¶
| Method | Signature | Purpose |
|---|---|---|
learn |
(content: str) -> None |
Feed content to the agent. Called once per dialogue turn during evaluation. |
answer |
(question: str) -> AgentResponse |
Ask the agent a question. Must return an AgentResponse. |
reset |
() -> None |
Reset all agent state. Called between evaluation runs. |
close |
() -> None |
Clean up resources. Called when evaluation is complete. |
Optional Properties¶
| Property | Type | Default | Purpose |
|---|---|---|---|
capabilities |
set[str] |
{"memory"} |
Declare what the agent can do. Used by the runner to select appropriate eval levels. |
name |
str |
Class name | Human-readable name for reports and logs. |
AgentResponse¶
The answer() method must return an AgentResponse:
@dataclass
class AgentResponse:
answer: str # Required: the agent's answer text
tool_calls: list[ToolCall] = [] # Optional: tool invocations
reasoning_trace: str = "" # Optional: chain-of-thought
confidence: float = 0.0 # Optional: self-reported confidence
metadata: dict[str, Any] = {} # Optional: arbitrary metadata
ToolCall¶
If your agent uses tools, capture them for trajectory analysis:
@dataclass
class ToolCall:
tool_name: str # Name of the tool invoked
arguments: dict[str, Any] # Arguments passed to the tool
result: str # String result from the tool
timestamp: float = 0.0 # Optional: when the call happened
Built-in Adapters¶
HttpAdapter¶
For agents exposed via REST API:
from amplihack_eval.adapters.http_adapter import HttpAdapter
adapter = HttpAdapter(
base_url="http://localhost:8000",
timeout=30,
)
Expected endpoints:
- POST /learn with {"content": "..."} -> 200 OK
- POST /answer with {"question": "..."} -> {"answer": "...", "tool_calls": [...], ...}
- POST /reset -> 200 OK
SubprocessAdapter¶
For agents invokable via CLI:
from amplihack_eval.adapters.subprocess_adapter import SubprocessAdapter
adapter = SubprocessAdapter(
command=["python", "my_agent.py"],
learn_flag="--learn",
answer_flag="--answer",
)
LearningAgentAdapter¶
For the amplihack LearningAgent (requires amplihack package):
from amplihack_eval.adapters.learning_agent import LearningAgentAdapter
adapter = LearningAgentAdapter()
Complete Custom Adapter Example¶
from amplihack_eval import AgentAdapter, AgentResponse, ToolCall
class RAGAgent(AgentAdapter):
"""Adapter for a retrieval-augmented generation agent."""
def __init__(self, db_url: str, model: str = "gpt-4"):
self.db_url = db_url
self.model = model
self.client = VectorDBClient(db_url)
self.llm = LLMClient(model)
def learn(self, content: str) -> None:
# Chunk and embed content into vector DB
chunks = self._chunk(content)
embeddings = self.llm.embed(chunks)
self.client.upsert(chunks, embeddings)
def answer(self, question: str) -> AgentResponse:
# Retrieve relevant chunks
query_embedding = self.llm.embed([question])[0]
results = self.client.search(query_embedding, top_k=5)
# Generate answer with context
context = "\n".join(r.text for r in results)
response = self.llm.generate(
f"Context:\n{context}\n\nQuestion: {question}\nAnswer:"
)
return AgentResponse(
answer=response.text,
tool_calls=[
ToolCall(
tool_name="vector_search",
arguments={"query": question, "top_k": 5},
result=f"Found {len(results)} chunks",
)
],
reasoning_trace=f"Retrieved {len(results)} chunks, generated answer",
confidence=response.confidence,
)
def reset(self) -> None:
self.client.clear()
def close(self) -> None:
self.client.close()
self.llm.close()
@property
def capabilities(self) -> set[str]:
return {"memory", "tool_use"}
@property
def name(self) -> str:
return f"RAGAgent({self.model})"
Running Evaluation with a Custom Adapter¶
from amplihack_eval import EvalRunner
agent = RAGAgent(db_url="http://localhost:6333", model="gpt-4")
runner = EvalRunner(num_turns=100, num_questions=20, grader_votes=3)
report = runner.run(agent)
print(f"Overall: {report.overall_score:.2%}")
for cb in report.category_breakdown:
print(f" {cb.category}: {cb.avg_score:.2%}")
agent.close()
Tips¶
- Keep
learn()fast: The runner calls it once per dialogue turn (potentially 1000+ times). Batch operations if possible. - Capture tool calls: Even if your agent does not use explicit tools, logging internal retrieval as a
ToolCallenables richer analysis. - Set confidence: If your agent can estimate confidence, include it. The grader uses confidence calibration in advanced eval levels (L8).
- Reset completely:
reset()must clear ALL state. Leftover state between runs corrupts multi-seed evaluation.