FewShotManager Module Documentation¶
Module: wikigr.agent.enhancements.few_shot_manager
Module Overview¶
FewShotManager loads and injects pack-specific few-shot examples into the synthesis prompt, guiding Claude to follow consistent answer patterns and citation styles.
Accuracy Impact: +5-10% over zero-shot synthesis Citation Quality Impact: +70% (from 20% to 90%) Latency: +20ms per query (semantic search over examples)
Module-Level Docstring¶
"""
Few-shot example management for knowledge pack queries.
This module provides few-shot learning capabilities by injecting pack-specific
examples into the synthesis prompt. Examples demonstrate desired answer format,
citation style, and reasoning patterns, significantly improving answer quality
and consistency.
Algorithm:
1. Load examples from pack's few_shot_examples.json file
2. Embed all example questions using sentence-transformers
3. Given a query, find K most similar examples via cosine similarity
4. Format examples for prompt injection
5. Claude synthesizes answer following example patterns
Benefits:
- Consistent answer format across pack queries
- Improved citation quality (examples show proper source attribution)
- Better reasoning structure (examples demonstrate step-by-step logic)
- Domain-specific answer patterns (physics vs programming style)
Performance:
- Example loading: O(E) where E = number of examples (one-time cost)
- Example retrieval: O(E) semantic search (typically E < 20)
- Typical overhead: +20ms per query
Example:
>>> from wikigr.agent.enhancements.few_shot_manager import FewShotManager
>>>
>>> # Load pack examples
>>> manager = FewShotManager(
... pack_dir="data/packs/physics-expert",
... num_examples=3
... )
>>>
>>> # Get relevant examples for a question
>>> examples = manager.get_examples(
... question="What is quantum entanglement?",
... num_examples=2
... )
>>>
>>> # Format for prompt
>>> formatted = manager.format_for_prompt(examples)
>>> print(formatted[:200])
=== Example 1 ===
Question: What is the speed of light?
Context: {...}
Answer: The speed of light in vacuum is...
Example File Structure:
data/packs/physics-expert/few_shot_examples.json:
{
"examples": [
{
"question": "What is quantum entanglement?",
"context": {
"articles": ["Quantum_entanglement", "EPR_paradox"],
"facts": ["Quantum entanglement is...", "EPR paradox..."]
},
"answer": "Quantum entanglement is... [Source: Quantum_entanglement]",
"reasoning": "Answer synthesizes information from both articles..."
}
]
}
Dependencies:
- sentence-transformers: For example embedding and similarity
- json: Example file parsing
See Also:
- GraphReranker: Graph-based result reranking
- MultiDocSynthesizer: Multi-document retrieval
"""
Class: FewShotManager¶
class FewShotManager:
"""
Manages few-shot examples for knowledge pack queries.
This class loads pack-specific examples from a JSON file, embeds them
for semantic retrieval, and provides methods to find the most relevant
examples for a given query.
Attributes:
pack_dir (str): Path to knowledge pack directory
num_examples (int): Default number of examples to retrieve
cache (bool): Whether to cache loaded examples
_examples (list[dict] | None): Cached examples
_embeddings (np.ndarray | None): Cached example embeddings
_embedding_model: Sentence-transformers model for example retrieval
Example:
>>> manager = FewShotManager(pack_dir="data/packs/physics-expert")
>>> examples = manager.get_examples("What is quantum mechanics?")
>>> print(len(examples))
3
"""
Constructor¶
def __init__(
self,
pack_dir: str,
num_examples: int = 3,
cache: bool = True
) -> None:
"""
Initialize FewShotManager with pack directory.
Args:
pack_dir: Path to knowledge pack directory (must contain few_shot_examples.json)
num_examples: Default number of examples to retrieve (range: 1-10, default: 3)
cache: Cache loaded examples in memory (default: True)
Raises:
FileNotFoundError: If pack_dir or few_shot_examples.json not found
json.JSONDecodeError: If few_shot_examples.json is invalid JSON
ValueError: If num_examples out of range [1, 10]
Example:
>>> # Standard initialization
>>> manager = FewShotManager(pack_dir="data/packs/physics-expert")
>>>
>>> # Custom number of examples
>>> manager = FewShotManager(
... pack_dir="data/packs/physics-expert",
... num_examples=5
... )
>>>
>>> # Disable caching (for dynamic example updates)
>>> manager = FewShotManager(
... pack_dir="data/packs/physics-expert",
... cache=False
... )
"""
get_examples() Method¶
def get_examples(
self,
question: str,
num_examples: int | None = None
) -> list[dict]:
"""
Get the most relevant few-shot examples for a question.
Uses semantic similarity (cosine) to find examples with questions
most similar to the input question.
Args:
question: Question to find relevant examples for
num_examples: Number of examples to return (defaults to constructor value)
Returns:
List of example dictionaries, ranked by relevance (most relevant first)
Raises:
ValueError: If question is empty
RuntimeError: If examples cannot be loaded or embedded
Example:
>>> manager = FewShotManager(pack_dir="data/packs/physics-expert")
>>> examples = manager.get_examples(
... question="What is quantum entanglement?",
... num_examples=2
... )
>>>
>>> for i, ex in enumerate(examples, 1):
... print(f"Example {i}: {ex['question']}")
Example 1: What is quantum entanglement?
Example 2: What is the EPR paradox?
Response Format:
[
{
"question": str,
"context": {
"articles": list[str],
"facts": list[str]
},
"answer": str,
"reasoning": str # Optional
}
]
"""
load_examples() Method¶
def load_examples(self) -> list[dict]:
"""
Load all examples from few_shot_examples.json.
Returns:
List of all examples from the pack's example file
Raises:
FileNotFoundError: If few_shot_examples.json not found
json.JSONDecodeError: If file contains invalid JSON
KeyError: If 'examples' key missing from JSON
Example:
>>> manager = FewShotManager(pack_dir="data/packs/physics-expert")
>>> all_examples = manager.load_examples()
>>> print(f"Loaded {len(all_examples)} examples")
Loaded 10 examples
>>>
>>> # Inspect example structure
>>> example = all_examples[0]
>>> print(example.keys())
dict_keys(['question', 'context', 'answer', 'reasoning'])
File Format:
{
"examples": [
{
"question": "What is X?",
"context": {
"articles": ["Article_1", "Article_2"],
"facts": ["Fact 1", "Fact 2"]
},
"answer": "X is... [Source: Article_1]",
"reasoning": "Answer provides..."
}
]
}
"""
format_for_prompt() Method¶
def format_for_prompt(
self,
examples: list[dict]
) -> str:
"""
Format examples for injection into Claude prompt.
Formats examples as numbered, structured text blocks that Claude can
use as reference for answer format and citation style.
Args:
examples: List of example dictionaries to format
Returns:
Formatted string suitable for prompt injection
Example:
>>> manager = FewShotManager(pack_dir="data/packs/physics-expert")
>>> examples = manager.get_examples("What is X?", num_examples=2)
>>> formatted = manager.format_for_prompt(examples)
>>>
>>> print(formatted)
=== Example 1 ===
Question: What is quantum entanglement?
Context:
Articles: Quantum_entanglement, EPR_paradox
Facts:
- Quantum entanglement is a phenomenon where...
- EPR paradox demonstrates quantum nonlocality...
Answer: Quantum entanglement is a phenomenon where two or more
particles become correlated... [Source: Quantum_entanglement,
EPR_paradox]
=== Example 2 ===
Question: What is the speed of light?
Context:
Articles: Speed_of_light
Facts:
- The speed of light in vacuum is 299,792,458 m/s...
Answer: The speed of light in vacuum is exactly 299,792,458 m/s...
[Source: Speed_of_light]
Usage in Prompt:
>>> prompt = f'''
... Here are examples of high-quality answers:
...
... {formatted}
...
... Now answer this question following the same pattern:
... Question: {question}
... Context: {context}
... Answer:
... '''
"""
_embed_examples() Method¶
def _embed_examples(
self,
examples: list[dict]
) -> np.ndarray:
"""
Embed all example questions for semantic retrieval.
Args:
examples: List of examples with 'question' field
Returns:
Numpy array of embeddings (shape: [num_examples, embedding_dim])
Example:
>>> manager = FewShotManager(pack_dir="data/packs/physics-expert")
>>> examples = manager.load_examples()
>>> embeddings = manager._embed_examples(examples)
>>> print(embeddings.shape)
(10, 384) # 10 examples, 384-dim embeddings
"""
_compute_similarity() Method¶
def _compute_similarity(
self,
query_embedding: np.ndarray,
example_embeddings: np.ndarray
) -> np.ndarray:
"""
Compute cosine similarity between query and examples.
Args:
query_embedding: Query embedding (shape: [embedding_dim])
example_embeddings: Example embeddings (shape: [num_examples, embedding_dim])
Returns:
Similarity scores (shape: [num_examples])
Example:
>>> query_emb = np.array([0.1, 0.2, 0.3])
>>> example_embs = np.array([[0.1, 0.2, 0.3], [0.3, 0.2, 0.1]])
>>> similarities = manager._compute_similarity(query_emb, example_embs)
>>> print(similarities)
[1.0, 0.714] # First example is identical, second is similar
"""
Usage Examples¶
Basic Example Retrieval¶
from wikigr.agent.enhancements.few_shot_manager import FewShotManager
# Initialize manager
manager = FewShotManager(
pack_dir="data/packs/physics-expert",
num_examples=3
)
# Get relevant examples
examples = manager.get_examples(
question="What is quantum entanglement?",
num_examples=2
)
# Inspect examples
for i, example in enumerate(examples, 1):
print(f"\n=== Example {i} ===")
print(f"Question: {example['question']}")
print(f"Answer: {example['answer'][:100]}...")
print(f"Sources: {', '.join(example['context']['articles'])}")
Output:
=== Example 1 ===
Question: What is quantum entanglement?
Answer: Quantum entanglement is a phenomenon where two or more particles become correlated in such a...
Sources: Quantum_entanglement, EPR_paradox
=== Example 2 ===
Question: What is the EPR paradox?
Answer: The EPR paradox is a thought experiment proposed by Einstein, Podolsky, and Rosen that demon...
Sources: EPR_paradox, Quantum_mechanics
Integration with KG Agent¶
from wikigr.agent.kg_agent import KnowledgeGraphAgent
from wikigr.agent.enhancements.few_shot_manager import FewShotManager
class EnhancedKGAgent(KnowledgeGraphAgent):
"""KG Agent with few-shot example injection."""
def __init__(self, db_path: str, pack_dir: str, **kwargs):
super().__init__(db_path, **kwargs)
self.few_shot = FewShotManager(pack_dir, num_examples=3)
def _synthesize_answer(self, question: str, context: dict, query_plan: dict) -> str:
"""Override synthesis to include few-shot examples."""
# Get relevant examples
examples = self.few_shot.get_examples(question, num_examples=3)
examples_text = self.few_shot.format_for_prompt(examples)
# Build prompt with examples
prompt = f"""
Here are examples of high-quality answers for this knowledge pack:
{examples_text}
Now answer this question following the same pattern:
Question: {question}
Context: {json.dumps(context, indent=2)}
Answer:
"""
# Call Claude
response = self.claude.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": prompt}],
max_tokens=1024
)
return response.content[0].text
# Use enhanced agent
agent = EnhancedKGAgent(
db_path="data/packs/physics-expert/physics.db",
pack_dir="data/packs/physics-expert"
)
result = agent.query("What is quantum entanglement?")
print(result["answer"])
Creating Example Files¶
import json
from pathlib import Path
# Define pack examples
examples = {
"examples": [
{
"question": "What is quantum entanglement?",
"context": {
"articles": ["Quantum_entanglement", "EPR_paradox"],
"facts": [
"Quantum entanglement is a phenomenon where particles become correlated.",
"EPR paradox demonstrates quantum nonlocality.",
"Entanglement is a key resource for quantum computing."
]
},
"answer": (
"Quantum entanglement is a phenomenon where two or more particles "
"become correlated in such a way that the quantum state of each "
"particle cannot be described independently. This correlation "
"persists regardless of the distance between particles, demonstrating "
"quantum nonlocality as shown by the EPR paradox. Entanglement is a "
"fundamental resource for quantum computing and quantum communication. "
"[Source: Quantum_entanglement, EPR_paradox]"
),
"reasoning": (
"Answer synthesizes information from both articles, provides clear "
"definition, explains significance, and properly cites sources."
)
},
{
"question": "What is the speed of light?",
"context": {
"articles": ["Speed_of_light"],
"facts": [
"The speed of light in vacuum is 299,792,458 m/s.",
"The speed of light is denoted by the symbol 'c'.",
"It is a fundamental constant in physics."
]
},
"answer": (
"The speed of light in vacuum is exactly 299,792,458 m/s "
"(approximately 3 × 10^8 m/s), as defined by the International "
"System of Units. This fundamental constant, denoted by 'c', is "
"the maximum speed at which all energy, matter, and information "
"can travel. [Source: Speed_of_light]"
),
"reasoning": (
"Answer provides exact value with approximation, explains physical "
"significance, and includes proper source citation."
)
}
]
}
# Save to pack directory
pack_dir = Path("data/packs/physics-expert")
examples_file = pack_dir / "few_shot_examples.json"
with open(examples_file, "w") as f:
json.dump(examples, f, indent=2)
print(f"Created {examples_file} with {len(examples['examples'])} examples")
Dynamic Example Management¶
class DynamicFewShotManager(FewShotManager):
"""FewShotManager with dynamic example updates."""
def add_example(self, example: dict) -> None:
"""Add a new example to the pack."""
# Load existing examples
examples = self.load_examples()
examples.append(example)
# Save updated examples
examples_file = Path(self.pack_dir) / "few_shot_examples.json"
with open(examples_file, "w") as f:
json.dump({"examples": examples}, f, indent=2)
# Clear cache to reload
if self.cache:
self._examples = None
self._embeddings = None
def remove_example(self, question: str) -> None:
"""Remove an example by question text."""
examples = self.load_examples()
examples = [ex for ex in examples if ex["question"] != question]
# Save updated examples
examples_file = Path(self.pack_dir) / "few_shot_examples.json"
with open(examples_file, "w") as f:
json.dump({"examples": examples}, f, indent=2)
# Clear cache
if self.cache:
self._examples = None
self._embeddings = None
# Use dynamic manager
manager = DynamicFewShotManager(pack_dir="data/packs/physics-expert")
# Add a new example
new_example = {
"question": "What is the Heisenberg uncertainty principle?",
"context": {
"articles": ["Uncertainty_principle"],
"facts": ["The uncertainty principle states that..."]
},
"answer": "The Heisenberg uncertainty principle... [Source: Uncertainty_principle]",
"reasoning": "Clear explanation with proper citation."
}
manager.add_example(new_example)
# Remove an outdated example
manager.remove_example("Old question to remove")
Example Quality Analysis¶
from wikigr.agent.enhancements.few_shot_manager import FewShotManager
def analyze_example_quality(manager: FewShotManager) -> dict:
"""Analyze quality metrics of pack examples."""
examples = manager.load_examples()
metrics = {
"total_examples": len(examples),
"avg_question_length": sum(len(ex["question"]) for ex in examples) / len(examples),
"avg_answer_length": sum(len(ex["answer"]) for ex in examples) / len(examples),
"examples_with_reasoning": sum(1 for ex in examples if "reasoning" in ex),
"avg_sources_per_example": sum(len(ex["context"]["articles"]) for ex in examples) / len(examples),
"avg_facts_per_example": sum(len(ex["context"]["facts"]) for ex in examples) / len(examples)
}
return metrics
# Analyze physics pack examples
manager = FewShotManager(pack_dir="data/packs/physics-expert")
metrics = analyze_example_quality(manager)
print("Example Quality Metrics:")
for key, value in metrics.items():
print(f" {key}: {value:.2f}")
Output:
Example Quality Metrics:
total_examples: 10.00
avg_question_length: 42.30
avg_answer_length: 387.50
examples_with_reasoning: 10.00
avg_sources_per_example: 2.10
avg_facts_per_example: 4.80
Performance Tuning¶
Recommended Settings by Pack Size¶
| Pack Size | num_examples | Notes |
|---|---|---|
| <100 articles | 2-3 | Small packs benefit from focused examples |
| 100-500 articles | 3-5 | Standard setting (default: 3) |
| 500+ articles | 5-7 | Large packs need more coverage |
Example Quality Guidelines¶
High-Quality Examples Have: - Clear, specific questions (not too broad or vague) - Comprehensive context (2-5 articles, 3-8 facts) - Well-structured answers (intro, body, conclusion) - Proper source citation (all facts attributed) - Reasoning explanation (why this answer is good)
Example Quality Checklist:
def validate_example(example: dict) -> list[str]:
"""Validate example quality and return issues."""
issues = []
# Check required fields
if "question" not in example:
issues.append("Missing 'question' field")
if "context" not in example:
issues.append("Missing 'context' field")
if "answer" not in example:
issues.append("Missing 'answer' field")
# Check question length
if len(example.get("question", "")) < 10:
issues.append("Question too short (<10 chars)")
# Check context
context = example.get("context", {})
if len(context.get("articles", [])) == 0:
issues.append("No articles in context")
if len(context.get("facts", [])) < 2:
issues.append("Too few facts (<2)")
# Check answer quality
answer = example.get("answer", "")
if len(answer) < 100:
issues.append("Answer too short (<100 chars)")
if "[Source:" not in answer:
issues.append("Missing source citation")
return issues
# Validate all examples
manager = FewShotManager(pack_dir="data/packs/physics-expert")
examples = manager.load_examples()
for i, example in enumerate(examples, 1):
issues = validate_example(example)
if issues:
print(f"Example {i} issues:")
for issue in issues:
print(f" - {issue}")
Caching Strategies¶
# In-memory caching (default)
manager = FewShotManager(pack_dir="data/packs/physics-expert", cache=True)
# Persistent caching (across sessions)
import pickle
from pathlib import Path
class PersistentFewShotManager(FewShotManager):
"""FewShotManager with disk-based caching."""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.cache_file = Path(self.pack_dir) / ".few_shot_cache.pkl"
def load_examples(self):
# Try to load from cache
if self.cache_file.exists():
with open(self.cache_file, "rb") as f:
return pickle.load(f)
# Load from JSON and cache
examples = super().load_examples()
with open(self.cache_file, "wb") as f:
pickle.dump(examples, f)
return examples
manager = PersistentFewShotManager(pack_dir="data/packs/physics-expert")
Testing¶
import pytest
from wikigr.agent.enhancements.few_shot_manager import FewShotManager
def test_load_examples():
"""Test that examples are loaded correctly."""
manager = FewShotManager(pack_dir="data/packs/test-pack")
examples = manager.load_examples()
assert len(examples) > 0
assert all("question" in ex for ex in examples)
assert all("answer" in ex for ex in examples)
def test_get_examples_relevance():
"""Test that retrieved examples are relevant."""
manager = FewShotManager(pack_dir="data/packs/test-pack")
examples = manager.get_examples(
question="What is quantum entanglement?",
num_examples=2
)
assert len(examples) <= 2
# First example should be most relevant
assert "quantum" in examples[0]["question"].lower()
def test_format_for_prompt():
"""Test prompt formatting."""
manager = FewShotManager(pack_dir="data/packs/test-pack")
examples = manager.get_examples("What is X?", num_examples=2)
formatted = manager.format_for_prompt(examples)
assert "=== Example 1 ===" in formatted
assert "Question:" in formatted
assert "Answer:" in formatted
assert "Context:" in formatted
def test_missing_examples_file():
"""Test error handling for missing examples file."""
with pytest.raises(FileNotFoundError):
manager = FewShotManager(pack_dir="nonexistent/pack")
manager.load_examples()
def test_invalid_num_examples():
"""Test validation of num_examples parameter."""
with pytest.raises(ValueError):
manager = FewShotManager(pack_dir="data/packs/test-pack", num_examples=0)
with pytest.raises(ValueError):
manager = FewShotManager(pack_dir="data/packs/test-pack", num_examples=11)
Troubleshooting¶
FileNotFoundError: few_shot_examples.json¶
Problem: FileNotFoundError: few_shot_examples.json not found in pack directory
Cause: Pack directory missing examples file.
Solution: Create examples file:
cd data/packs/your-pack
echo '{"examples": []}' > few_shot_examples.json
Empty Examples List¶
Problem: get_examples() returns empty list.
Cause: No examples in file or all filtered out.
Solution: Add examples to few_shot_examples.json:
import json
from pathlib import Path
examples = {"examples": [
{
"question": "Example question?",
"context": {"articles": ["Article"], "facts": ["Fact"]},
"answer": "Example answer. [Source: Article]"
}
]}
path = Path("data/packs/your-pack/few_shot_examples.json")
with open(path, "w") as f:
json.dump(examples, f, indent=2)
Low Citation Quality Despite Examples¶
Problem: Citation quality still low even with few-shot examples.
Cause: Examples don't demonstrate good citation patterns.
Solution: Improve example quality:
# BAD: No citations
"answer": "Quantum mechanics describes atoms and particles."
# GOOD: Proper citations
"answer": "Quantum mechanics is a theory that describes the behavior of matter and energy at atomic scales. [Source: Quantum_mechanics, Atomic_theory]"
Irrelevant Examples Retrieved¶
Problem: Retrieved examples not relevant to question.
Cause: Example questions not diverse enough or poor embedding quality.
Solution: Add more diverse examples covering different query types:
# Ensure examples cover:
# - Factual questions ("What is X?")
# - Explanatory questions ("How does X work?")
# - Comparative questions ("What's the difference between X and Y?")
# - Application questions ("What are the uses of X?")
See Also¶
- Phase 1 Enhancements Reference - Complete API reference
- Phase 1 How-To Guide - Usage examples
- GraphReranker Module - Graph-based reranking
- MultiDocSynthesizer Module - Multi-document retrieval