Skip to content

Multi-Agent Evaluation

Status

The multi-agent evaluation module (amplihack_eval.multi_agent_eval) is reserved for future development. The module exists as a placeholder with plans for the following capabilities.

Planned Architecture

Multi-agent evaluation will test scenarios where multiple agents collaborate or compete to accomplish tasks. Unlike single-agent evaluation (which tests memory and reasoning), multi-agent evaluation tests coordination, communication, and role specialization.

Planned Scenarios

Collaborative Knowledge Building

Multiple agents each learn different subsets of information, then must combine their knowledge to answer questions that no single agent could answer alone.

Agent A learns: Articles 1-5
Agent B learns: Articles 6-10
Agent C learns: Articles 11-15

Question: "Compare findings from Article 3 and Article 12"
-> Requires A and C to collaborate

Debate and Consensus

Agents are given ambiguous or contradictory information and must debate to reach a consensus answer. Tests argumentation quality, evidence weighing, and convergence.

Task Delegation

A coordinator agent receives a complex task and must delegate subtasks to specialist agents, then synthesize their results. Tests planning, delegation, and integration.

Adversarial Robustness

One agent attempts to inject misleading information while others must maintain accuracy. Tests resilience to adversarial inputs.

Planned Interface

The multi-agent adapter interface will extend AgentAdapter:

class MultiAgentAdapter(AgentAdapter):
    """Adapter for a group of agents that can communicate."""

    @abstractmethod
    def send_message(self, from_agent: str, to_agent: str, message: str) -> str:
        """Send a message between agents."""

    @abstractmethod
    def get_agents(self) -> list[str]:
        """List all agent identifiers in the group."""

    @abstractmethod
    def assign_role(self, agent_id: str, role: str) -> None:
        """Assign a role to a specific agent."""

Contributing

If you are interested in contributing to the multi-agent evaluation module, please open an issue on GitHub to discuss your proposed scenario before implementing.