Overview¶

Knowledge Packs are self-contained, domain-specific knowledge graph databases that augment LLMs with curated, structured information. Each pack packages a LadybugDB graph database, vector embeddings, retrieval configuration, and evaluation questions into a portable unit.

What Problem Packs Solve¶

LLMs have three specific limitations that packs address:

1. Training Data Cutoff¶

Models are trained on data up to a fixed date. APIs change, frameworks release new versions, and documentation evolves. A pack built from current documentation gives the model access to information it has never seen during training.

Example: React 19 introduced useActionState, useOptimistic, and the "use server" directive. A model trained before React 19's release cannot answer questions about these features accurately. The react-expert pack contains the current React documentation and enables correct answers.

2. Depth Gaps¶

Training data covers topics broadly -- models know about most technologies. But they often lack the implementation-level detail needed for expert questions: specific API parameters, edge cases, version-specific behavior, and integration patterns.

Example: Claude knows what Go goroutines are, but may not know the exact behavior of iter.Seq[V any] introduced in Go 1.23. The go-expert pack contains the full Go standard library documentation with section-level detail.

3. Grounding and Provenance¶

When models answer from training data, there is no way to trace the answer back to a specific source. Pack-augmented answers include article titles and section references, making it possible to verify claims against the original documentation.

When to Build a Pack¶

Use this decision framework to determine whether a pack will add value for a given domain:

Question	If Yes	If No
Does Claude already answer domain questions correctly?	Pack may not add value -- test first with eval	Good candidate for a pack
Is the content changing faster than training updates?	Strong candidate (APIs, framework docs, SDKs)	Lower priority
Is implementation-level depth important?	Pack adds section-level retrieval	General knowledge may suffice
Do you need source attribution?	Pack provides article-level provenance	Training-based answers are fine
Is the domain covered by public documentation?	Build from URLs	May need custom content sources

Strong pack candidates:

Framework SDKs with frequent releases (Vercel AI SDK, LangChain, LlamaIndex)
Cloud platform services with evolving APIs (Azure, AWS)
Programming languages with recent feature additions (Go 1.23, Zig 0.13)
Specialized protocols and standards (MCP, OpenCypher)

When NOT to Build a Pack¶

Packs add complexity. Do not build one when:

Claude already knows the topic well. Stable, well-known topics like "what is TCP" or "explain binary search" get excellent answers from training alone. A pack would add latency without improving quality.
The topic is too broad. A "general computer science" pack would need thousands of articles and still have gaps. Packs work best for focused domains.
The documentation is behind authentication. Pack URLs must be publicly accessible. Private docs require custom content source implementations.

Architecture at a Glance¶

┌──────────────────────────────────────────────────────────┐
│                    Knowledge Pack                         │
│                                                          │
│  ┌─────────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │  LadybugDB   │  │ BGE Vectors  │  │  Enhancement  │  │
│  │   Database   │  │  (HNSW idx)  │  │   Modules     │  │
│  │             │  │              │  │               │  │
│  │ Articles    │  │ Section      │  │ Reranker      │  │
│  │ Sections    │  │ embeddings   │  │ MultiDoc      │  │
│  │ Entities    │  │ 768-dim      │  │ FewShot       │  │
│  │ Relations   │  │ cosine sim   │  │ CrossEncoder  │  │
│  └─────────────┘  └──────────────┘  └───────────────┘  │
│                                                          │
│  ┌──────────────────────────────────────────────────┐   │
│  │              Retrieval Pipeline                    │   │
│  │  query -> vector search -> confidence gate ->     │   │
│  │  rerank -> multi-doc expand -> synthesize         │   │
│  └──────────────────────────────────────────────────┘   │
│                                                          │
│  ┌──────────────────────────────────────────────────┐   │
│  │              Evaluation Framework                 │   │
│  │  questions.jsonl -> training | pack               │   │
│  │  -> judge scoring -> accuracy metrics             │   │
│  └──────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────┘

Core Components¶

Component	Technology	Role
Graph Database	LadybugDB (embedded)	Stores articles, sections, entities, relationships as graph nodes and edges
Vector Embeddings	BAAI/bge-base-en-v1.5 (768-dim)	Enables semantic search over section content via HNSW index
Synthesis	Claude (Opus)	Generates natural language answers from retrieved context
Query Expansion	Claude (Haiku)	Generates alternative phrasings for multi-query retrieval
Enhancement Modules	Python classes	Confidence gating, cross-encoder reranking, graph reranking, multi-doc synthesis, few-shot examples, content quality scoring

Next Steps¶

Quick Start -- Build and query your first pack in 5 minutes
Tutorial -- Full lifecycle walkthrough from domain selection to deployment
How Packs Work -- Deep dive into the content and retrieval pipelines