FewShotManager Module Documentation¶

Module: wikigr.agent.enhancements.few_shot_manager

Module Overview¶

FewShotManager loads and injects pack-specific few-shot examples into the synthesis prompt, guiding Claude to follow consistent answer patterns and citation styles.

Accuracy Impact: +5-10% over zero-shot synthesis Citation Quality Impact: +70% (from 20% to 90%) Latency: +20ms per query (semantic search over examples)

Module-Level Docstring¶

"""
Few-shot example management for knowledge pack queries.

This module provides few-shot learning capabilities by injecting pack-specific
examples into the synthesis prompt. Examples demonstrate desired answer format,
citation style, and reasoning patterns, significantly improving answer quality
and consistency.

Algorithm:
    1. Load examples from pack's few_shot_examples.json file
    2. Embed all example questions using sentence-transformers
    3. Given a query, find K most similar examples via cosine similarity
    4. Format examples for prompt injection
    5. Claude synthesizes answer following example patterns

Benefits:
    - Consistent answer format across pack queries
    - Improved citation quality (examples show proper source attribution)
    - Better reasoning structure (examples demonstrate step-by-step logic)
    - Domain-specific answer patterns (physics vs programming style)

Performance:
    - Example loading: O(E) where E = number of examples (one-time cost)
    - Example retrieval: O(E) semantic search (typically E < 20)
    - Typical overhead: +20ms per query

Example:
    >>> from wikigr.agent.enhancements.few_shot_manager import FewShotManager
    >>>
    >>> # Load pack examples
    >>> manager = FewShotManager(
    ...     pack_dir="data/packs/physics-expert",
    ...     num_examples=3
    ... )
    >>>
    >>> # Get relevant examples for a question
    >>> examples = manager.get_examples(
    ...     question="What is quantum entanglement?",
    ...     num_examples=2
    ... )
    >>>
    >>> # Format for prompt
    >>> formatted = manager.format_for_prompt(examples)
    >>> print(formatted[:200])
    === Example 1 ===
    Question: What is the speed of light?
    Context: {...}
    Answer: The speed of light in vacuum is...

Example File Structure:
    data/packs/physics-expert/few_shot_examples.json:
    {
      "examples": [
        {
          "question": "What is quantum entanglement?",
          "context": {
            "articles": ["Quantum_entanglement", "EPR_paradox"],
            "facts": ["Quantum entanglement is...", "EPR paradox..."]
          },
          "answer": "Quantum entanglement is... [Source: Quantum_entanglement]",
          "reasoning": "Answer synthesizes information from both articles..."
        }
      ]
    }

Dependencies:
    - sentence-transformers: For example embedding and similarity
    - json: Example file parsing

See Also:
    - GraphReranker: Graph-based result reranking
    - MultiDocSynthesizer: Multi-document retrieval
"""

Class: FewShotManager¶

class FewShotManager:
    """
    Manages few-shot examples for knowledge pack queries.

    This class loads pack-specific examples from a JSON file, embeds them
    for semantic retrieval, and provides methods to find the most relevant
    examples for a given query.

    Attributes:
        pack_dir (str): Path to knowledge pack directory
        num_examples (int): Default number of examples to retrieve
        cache (bool): Whether to cache loaded examples
        _examples (list[dict] | None): Cached examples
        _embeddings (np.ndarray | None): Cached example embeddings
        _embedding_model: Sentence-transformers model for example retrieval

    Example:
        >>> manager = FewShotManager(pack_dir="data/packs/physics-expert")
        >>> examples = manager.get_examples("What is quantum mechanics?")
        >>> print(len(examples))
        3
    """

Constructor¶

def __init__(
    self,
    pack_dir: str,
    num_examples: int = 3,
    cache: bool = True
) -> None:
    """
    Initialize FewShotManager with pack directory.

    Args:
        pack_dir: Path to knowledge pack directory (must contain few_shot_examples.json)
        num_examples: Default number of examples to retrieve (range: 1-10, default: 3)
        cache: Cache loaded examples in memory (default: True)

    Raises:
        FileNotFoundError: If pack_dir or few_shot_examples.json not found
        json.JSONDecodeError: If few_shot_examples.json is invalid JSON
        ValueError: If num_examples out of range [1, 10]

    Example:
        >>> # Standard initialization
        >>> manager = FewShotManager(pack_dir="data/packs/physics-expert")
        >>>
        >>> # Custom number of examples
        >>> manager = FewShotManager(
        ...     pack_dir="data/packs/physics-expert",
        ...     num_examples=5
        ... )
        >>>
        >>> # Disable caching (for dynamic example updates)
        >>> manager = FewShotManager(
        ...     pack_dir="data/packs/physics-expert",
        ...     cache=False
        ... )
    """

get_examples() Method¶

def get_examples(
    self,
    question: str,
    num_examples: int | None = None
) -> list[dict]:
    """
    Get the most relevant few-shot examples for a question.

    Uses semantic similarity (cosine) to find examples with questions
    most similar to the input question.

    Args:
        question: Question to find relevant examples for
        num_examples: Number of examples to return (defaults to constructor value)

    Returns:
        List of example dictionaries, ranked by relevance (most relevant first)

    Raises:
        ValueError: If question is empty
        RuntimeError: If examples cannot be loaded or embedded

    Example:
        >>> manager = FewShotManager(pack_dir="data/packs/physics-expert")
        >>> examples = manager.get_examples(
        ...     question="What is quantum entanglement?",
        ...     num_examples=2
        ... )
        >>>
        >>> for i, ex in enumerate(examples, 1):
        ...     print(f"Example {i}: {ex['question']}")
        Example 1: What is quantum entanglement?
        Example 2: What is the EPR paradox?

    Response Format:
        [
            {
                "question": str,
                "context": {
                    "articles": list[str],
                    "facts": list[str]
                },
                "answer": str,
                "reasoning": str  # Optional
            }
        ]
    """

load_examples() Method¶

def load_examples(self) -> list[dict]:
    """
    Load all examples from few_shot_examples.json.

    Returns:
        List of all examples from the pack's example file

    Raises:
        FileNotFoundError: If few_shot_examples.json not found
        json.JSONDecodeError: If file contains invalid JSON
        KeyError: If 'examples' key missing from JSON

    Example:
        >>> manager = FewShotManager(pack_dir="data/packs/physics-expert")
        >>> all_examples = manager.load_examples()
        >>> print(f"Loaded {len(all_examples)} examples")
        Loaded 10 examples
        >>>
        >>> # Inspect example structure
        >>> example = all_examples[0]
        >>> print(example.keys())
        dict_keys(['question', 'context', 'answer', 'reasoning'])

    File Format:
        {
          "examples": [
            {
              "question": "What is X?",
              "context": {
                "articles": ["Article_1", "Article_2"],
                "facts": ["Fact 1", "Fact 2"]
              },
              "answer": "X is... [Source: Article_1]",
              "reasoning": "Answer provides..."
            }
          ]
        }
    """

format_for_prompt() Method¶

def format_for_prompt(
    self,
    examples: list[dict]
) -> str:
    """
    Format examples for injection into Claude prompt.

    Formats examples as numbered, structured text blocks that Claude can
    use as reference for answer format and citation style.

    Args:
        examples: List of example dictionaries to format

    Returns:
        Formatted string suitable for prompt injection

    Example:
        >>> manager = FewShotManager(pack_dir="data/packs/physics-expert")
        >>> examples = manager.get_examples("What is X?", num_examples=2)
        >>> formatted = manager.format_for_prompt(examples)
        >>>
        >>> print(formatted)
        === Example 1 ===
        Question: What is quantum entanglement?
        Context:
          Articles: Quantum_entanglement, EPR_paradox
          Facts:
            - Quantum entanglement is a phenomenon where...
            - EPR paradox demonstrates quantum nonlocality...
        Answer: Quantum entanglement is a phenomenon where two or more
                particles become correlated... [Source: Quantum_entanglement,
                EPR_paradox]

        === Example 2 ===
        Question: What is the speed of light?
        Context:
          Articles: Speed_of_light
          Facts:
            - The speed of light in vacuum is 299,792,458 m/s...
        Answer: The speed of light in vacuum is exactly 299,792,458 m/s...
                [Source: Speed_of_light]

    Usage in Prompt:
        >>> prompt = f'''
        ... Here are examples of high-quality answers:
        ...
        ... {formatted}
        ...
        ... Now answer this question following the same pattern:
        ... Question: {question}
        ... Context: {context}
        ... Answer:
        ... '''
    """

_embed_examples() Method¶

def _embed_examples(
    self,
    examples: list[dict]
) -> np.ndarray:
    """
    Embed all example questions for semantic retrieval.

    Args:
        examples: List of examples with 'question' field

    Returns:
        Numpy array of embeddings (shape: [num_examples, embedding_dim])

    Example:
        >>> manager = FewShotManager(pack_dir="data/packs/physics-expert")
        >>> examples = manager.load_examples()
        >>> embeddings = manager._embed_examples(examples)
        >>> print(embeddings.shape)
        (10, 384)  # 10 examples, 384-dim embeddings
    """

_compute_similarity() Method¶

def _compute_similarity(
    self,
    query_embedding: np.ndarray,
    example_embeddings: np.ndarray
) -> np.ndarray:
    """
    Compute cosine similarity between query and examples.

    Args:
        query_embedding: Query embedding (shape: [embedding_dim])
        example_embeddings: Example embeddings (shape: [num_examples, embedding_dim])

    Returns:
        Similarity scores (shape: [num_examples])

    Example:
        >>> query_emb = np.array([0.1, 0.2, 0.3])
        >>> example_embs = np.array([[0.1, 0.2, 0.3], [0.3, 0.2, 0.1]])
        >>> similarities = manager._compute_similarity(query_emb, example_embs)
        >>> print(similarities)
        [1.0, 0.714]  # First example is identical, second is similar
    """

Usage Examples¶

Basic Example Retrieval¶

from wikigr.agent.enhancements.few_shot_manager import FewShotManager

# Initialize manager
manager = FewShotManager(
    pack_dir="data/packs/physics-expert",
    num_examples=3
)

# Get relevant examples
examples = manager.get_examples(
    question="What is quantum entanglement?",
    num_examples=2
)

# Inspect examples
for i, example in enumerate(examples, 1):
    print(f"\n=== Example {i} ===")
    print(f"Question: {example['question']}")
    print(f"Answer: {example['answer'][:100]}...")
    print(f"Sources: {', '.join(example['context']['articles'])}")

Output:

=== Example 1 ===
Question: What is quantum entanglement?
Answer: Quantum entanglement is a phenomenon where two or more particles become correlated in such a...
Sources: Quantum_entanglement, EPR_paradox

=== Example 2 ===
Question: What is the EPR paradox?
Answer: The EPR paradox is a thought experiment proposed by Einstein, Podolsky, and Rosen that demon...
Sources: EPR_paradox, Quantum_mechanics

Integration with KG Agent¶

from wikigr.agent.kg_agent import KnowledgeGraphAgent
from wikigr.agent.enhancements.few_shot_manager import FewShotManager

class EnhancedKGAgent(KnowledgeGraphAgent):
    """KG Agent with few-shot example injection."""

    def __init__(self, db_path: str, pack_dir: str, **kwargs):
        super().__init__(db_path, **kwargs)
        self.few_shot = FewShotManager(pack_dir, num_examples=3)

    def _synthesize_answer(self, question: str, context: dict, query_plan: dict) -> str:
        """Override synthesis to include few-shot examples."""
        # Get relevant examples
        examples = self.few_shot.get_examples(question, num_examples=3)
        examples_text = self.few_shot.format_for_prompt(examples)

        # Build prompt with examples
        prompt = f"""
Here are examples of high-quality answers for this knowledge pack:

{examples_text}

Now answer this question following the same pattern:
Question: {question}
Context: {json.dumps(context, indent=2)}
Answer:
"""

        # Call Claude
        response = self.claude.messages.create(
            model="claude-3-5-sonnet-20241022",
            messages=[{"role": "user", "content": prompt}],
            max_tokens=1024
        )

        return response.content[0].text

# Use enhanced agent
agent = EnhancedKGAgent(
    db_path="data/packs/physics-expert/physics.db",
    pack_dir="data/packs/physics-expert"
)

result = agent.query("What is quantum entanglement?")
print(result["answer"])

Creating Example Files¶

import json
from pathlib import Path

# Define pack examples
examples = {
    "examples": [
        {
            "question": "What is quantum entanglement?",
            "context": {
                "articles": ["Quantum_entanglement", "EPR_paradox"],
                "facts": [
                    "Quantum entanglement is a phenomenon where particles become correlated.",
                    "EPR paradox demonstrates quantum nonlocality.",
                    "Entanglement is a key resource for quantum computing."
                ]
            },
            "answer": (
                "Quantum entanglement is a phenomenon where two or more particles "
                "become correlated in such a way that the quantum state of each "
                "particle cannot be described independently. This correlation "
                "persists regardless of the distance between particles, demonstrating "
                "quantum nonlocality as shown by the EPR paradox. Entanglement is a "
                "fundamental resource for quantum computing and quantum communication. "
                "[Source: Quantum_entanglement, EPR_paradox]"
            ),
            "reasoning": (
                "Answer synthesizes information from both articles, provides clear "
                "definition, explains significance, and properly cites sources."
            )
        },
        {
            "question": "What is the speed of light?",
            "context": {
                "articles": ["Speed_of_light"],
                "facts": [
                    "The speed of light in vacuum is 299,792,458 m/s.",
                    "The speed of light is denoted by the symbol 'c'.",
                    "It is a fundamental constant in physics."
                ]
            },
            "answer": (
                "The speed of light in vacuum is exactly 299,792,458 m/s "
                "(approximately 3 × 10^8 m/s), as defined by the International "
                "System of Units. This fundamental constant, denoted by 'c', is "
                "the maximum speed at which all energy, matter, and information "
                "can travel. [Source: Speed_of_light]"
            ),
            "reasoning": (
                "Answer provides exact value with approximation, explains physical "
                "significance, and includes proper source citation."
            )
        }
    ]
}

# Save to pack directory
pack_dir = Path("data/packs/physics-expert")
examples_file = pack_dir / "few_shot_examples.json"

with open(examples_file, "w") as f:
    json.dump(examples, f, indent=2)

print(f"Created {examples_file} with {len(examples['examples'])} examples")

Dynamic Example Management¶

class DynamicFewShotManager(FewShotManager):
    """FewShotManager with dynamic example updates."""

    def add_example(self, example: dict) -> None:
        """Add a new example to the pack."""
        # Load existing examples
        examples = self.load_examples()
        examples.append(example)

        # Save updated examples
        examples_file = Path(self.pack_dir) / "few_shot_examples.json"
        with open(examples_file, "w") as f:
            json.dump({"examples": examples}, f, indent=2)

        # Clear cache to reload
        if self.cache:
            self._examples = None
            self._embeddings = None

    def remove_example(self, question: str) -> None:
        """Remove an example by question text."""
        examples = self.load_examples()
        examples = [ex for ex in examples if ex["question"] != question]

        # Save updated examples
        examples_file = Path(self.pack_dir) / "few_shot_examples.json"
        with open(examples_file, "w") as f:
            json.dump({"examples": examples}, f, indent=2)

        # Clear cache
        if self.cache:
            self._examples = None
            self._embeddings = None

# Use dynamic manager
manager = DynamicFewShotManager(pack_dir="data/packs/physics-expert")

# Add a new example
new_example = {
    "question": "What is the Heisenberg uncertainty principle?",
    "context": {
        "articles": ["Uncertainty_principle"],
        "facts": ["The uncertainty principle states that..."]
    },
    "answer": "The Heisenberg uncertainty principle... [Source: Uncertainty_principle]",
    "reasoning": "Clear explanation with proper citation."
}
manager.add_example(new_example)

# Remove an outdated example
manager.remove_example("Old question to remove")

Example Quality Analysis¶

from wikigr.agent.enhancements.few_shot_manager import FewShotManager

def analyze_example_quality(manager: FewShotManager) -> dict:
    """Analyze quality metrics of pack examples."""
    examples = manager.load_examples()

    metrics = {
        "total_examples": len(examples),
        "avg_question_length": sum(len(ex["question"]) for ex in examples) / len(examples),
        "avg_answer_length": sum(len(ex["answer"]) for ex in examples) / len(examples),
        "examples_with_reasoning": sum(1 for ex in examples if "reasoning" in ex),
        "avg_sources_per_example": sum(len(ex["context"]["articles"]) for ex in examples) / len(examples),
        "avg_facts_per_example": sum(len(ex["context"]["facts"]) for ex in examples) / len(examples)
    }

    return metrics

# Analyze physics pack examples
manager = FewShotManager(pack_dir="data/packs/physics-expert")
metrics = analyze_example_quality(manager)

print("Example Quality Metrics:")
for key, value in metrics.items():
    print(f"  {key}: {value:.2f}")

Output:

Example Quality Metrics:
  total_examples: 10.00
  avg_question_length: 42.30
  avg_answer_length: 387.50
  examples_with_reasoning: 10.00
  avg_sources_per_example: 2.10
  avg_facts_per_example: 4.80

Performance Tuning¶

Recommended Settings by Pack Size¶

Pack Size	num_examples	Notes
<100 articles	2-3	Small packs benefit from focused examples
100-500 articles	3-5	Standard setting (default: 3)
500+ articles	5-7	Large packs need more coverage

Example Quality Guidelines¶

High-Quality Examples Have: - Clear, specific questions (not too broad or vague) - Comprehensive context (2-5 articles, 3-8 facts) - Well-structured answers (intro, body, conclusion) - Proper source citation (all facts attributed) - Reasoning explanation (why this answer is good)

Example Quality Checklist:

def validate_example(example: dict) -> list[str]:
    """Validate example quality and return issues."""
    issues = []

    # Check required fields
    if "question" not in example:
        issues.append("Missing 'question' field")
    if "context" not in example:
        issues.append("Missing 'context' field")
    if "answer" not in example:
        issues.append("Missing 'answer' field")

    # Check question length
    if len(example.get("question", "")) < 10:
        issues.append("Question too short (<10 chars)")

    # Check context
    context = example.get("context", {})
    if len(context.get("articles", [])) == 0:
        issues.append("No articles in context")
    if len(context.get("facts", [])) < 2:
        issues.append("Too few facts (<2)")

    # Check answer quality
    answer = example.get("answer", "")
    if len(answer) < 100:
        issues.append("Answer too short (<100 chars)")
    if "[Source:" not in answer:
        issues.append("Missing source citation")

    return issues

# Validate all examples
manager = FewShotManager(pack_dir="data/packs/physics-expert")
examples = manager.load_examples()

for i, example in enumerate(examples, 1):
    issues = validate_example(example)
    if issues:
        print(f"Example {i} issues:")
        for issue in issues:
            print(f"  - {issue}")

Caching Strategies¶

# In-memory caching (default)
manager = FewShotManager(pack_dir="data/packs/physics-expert", cache=True)

# Persistent caching (across sessions)
import pickle
from pathlib import Path

class PersistentFewShotManager(FewShotManager):
    """FewShotManager with disk-based caching."""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.cache_file = Path(self.pack_dir) / ".few_shot_cache.pkl"

    def load_examples(self):
        # Try to load from cache
        if self.cache_file.exists():
            with open(self.cache_file, "rb") as f:
                return pickle.load(f)

        # Load from JSON and cache
        examples = super().load_examples()
        with open(self.cache_file, "wb") as f:
            pickle.dump(examples, f)

        return examples

manager = PersistentFewShotManager(pack_dir="data/packs/physics-expert")

Testing¶

import pytest
from wikigr.agent.enhancements.few_shot_manager import FewShotManager

def test_load_examples():
    """Test that examples are loaded correctly."""
    manager = FewShotManager(pack_dir="data/packs/test-pack")
    examples = manager.load_examples()

    assert len(examples) > 0
    assert all("question" in ex for ex in examples)
    assert all("answer" in ex for ex in examples)

def test_get_examples_relevance():
    """Test that retrieved examples are relevant."""
    manager = FewShotManager(pack_dir="data/packs/test-pack")

    examples = manager.get_examples(
        question="What is quantum entanglement?",
        num_examples=2
    )

    assert len(examples) <= 2
    # First example should be most relevant
    assert "quantum" in examples[0]["question"].lower()

def test_format_for_prompt():
    """Test prompt formatting."""
    manager = FewShotManager(pack_dir="data/packs/test-pack")
    examples = manager.get_examples("What is X?", num_examples=2)

    formatted = manager.format_for_prompt(examples)

    assert "=== Example 1 ===" in formatted
    assert "Question:" in formatted
    assert "Answer:" in formatted
    assert "Context:" in formatted

def test_missing_examples_file():
    """Test error handling for missing examples file."""
    with pytest.raises(FileNotFoundError):
        manager = FewShotManager(pack_dir="nonexistent/pack")
        manager.load_examples()

def test_invalid_num_examples():
    """Test validation of num_examples parameter."""
    with pytest.raises(ValueError):
        manager = FewShotManager(pack_dir="data/packs/test-pack", num_examples=0)

    with pytest.raises(ValueError):
        manager = FewShotManager(pack_dir="data/packs/test-pack", num_examples=11)

Troubleshooting¶

FileNotFoundError: few_shot_examples.json¶

Problem: FileNotFoundError: few_shot_examples.json not found in pack directory

Cause: Pack directory missing examples file.

Solution: Create examples file:

cd data/packs/your-pack
echo '{"examples": []}' > few_shot_examples.json

Empty Examples List¶

Problem: get_examples() returns empty list.

Cause: No examples in file or all filtered out.

Solution: Add examples to few_shot_examples.json:

import json
from pathlib import Path

examples = {"examples": [
    {
        "question": "Example question?",
        "context": {"articles": ["Article"], "facts": ["Fact"]},
        "answer": "Example answer. [Source: Article]"
    }
]}

path = Path("data/packs/your-pack/few_shot_examples.json")
with open(path, "w") as f:
    json.dump(examples, f, indent=2)

Low Citation Quality Despite Examples¶

Problem: Citation quality still low even with few-shot examples.

Cause: Examples don't demonstrate good citation patterns.

Solution: Improve example quality:

# BAD: No citations
"answer": "Quantum mechanics describes atoms and particles."

# GOOD: Proper citations
"answer": "Quantum mechanics is a theory that describes the behavior of matter and energy at atomic scales. [Source: Quantum_mechanics, Atomic_theory]"

Irrelevant Examples Retrieved¶

Problem: Retrieved examples not relevant to question.

Cause: Example questions not diverse enough or poor embedding quality.

Solution: Add more diverse examples covering different query types:

# Ensure examples cover:
# - Factual questions ("What is X?")
# - Explanatory questions ("How does X work?")
# - Comparative questions ("What's the difference between X and Y?")
# - Application questions ("What are the uses of X?")