Skip to content

Overview

Knowledge Packs are self-contained, domain-specific knowledge graph databases that augment LLMs with curated, structured information. Each pack packages a LadybugDB graph database, vector embeddings, retrieval configuration, and evaluation questions into a portable unit.

What Problem Packs Solve

LLMs have three specific limitations that packs address:

1. Training Data Cutoff

Models are trained on data up to a fixed date. APIs change, frameworks release new versions, and documentation evolves. A pack built from current documentation gives the model access to information it has never seen during training.

Example: React 19 introduced useActionState, useOptimistic, and the "use server" directive. A model trained before React 19's release cannot answer questions about these features accurately. The react-expert pack contains the current React documentation and enables correct answers.

2. Depth Gaps

Training data covers topics broadly -- models know about most technologies. But they often lack the implementation-level detail needed for expert questions: specific API parameters, edge cases, version-specific behavior, and integration patterns.

Example: Claude knows what Go goroutines are, but may not know the exact behavior of iter.Seq[V any] introduced in Go 1.23. The go-expert pack contains the full Go standard library documentation with section-level detail.

3. Grounding and Provenance

When models answer from training data, there is no way to trace the answer back to a specific source. Pack-augmented answers include article titles and section references, making it possible to verify claims against the original documentation.

When to Build a Pack

Use this decision framework to determine whether a pack will add value for a given domain:

Question If Yes If No
Does Claude already answer domain questions correctly? Pack may not add value -- test first with eval Good candidate for a pack
Is the content changing faster than training updates? Strong candidate (APIs, framework docs, SDKs) Lower priority
Is implementation-level depth important? Pack adds section-level retrieval General knowledge may suffice
Do you need source attribution? Pack provides article-level provenance Training-based answers are fine
Is the domain covered by public documentation? Build from URLs May need custom content sources

Strong pack candidates:

  • Framework SDKs with frequent releases (Vercel AI SDK, LangChain, LlamaIndex)
  • Cloud platform services with evolving APIs (Azure, AWS)
  • Programming languages with recent feature additions (Go 1.23, Zig 0.13)
  • Specialized protocols and standards (MCP, OpenCypher)

When NOT to Build a Pack

Packs add complexity. Do not build one when:

  • Claude already knows the topic well. Stable, well-known topics like "what is TCP" or "explain binary search" get excellent answers from training alone. A pack would add latency without improving quality.
  • The topic is too broad. A "general computer science" pack would need thousands of articles and still have gaps. Packs work best for focused domains.
  • The documentation is behind authentication. Pack URLs must be publicly accessible. Private docs require custom content source implementations.

Architecture at a Glance

┌──────────────────────────────────────────────────────────┐
│                    Knowledge Pack                         │
│                                                          │
│  ┌─────────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │  LadybugDB   │  │ BGE Vectors  │  │  Enhancement  │  │
│  │   Database   │  │  (HNSW idx)  │  │   Modules     │  │
│  │             │  │              │  │               │  │
│  │ Articles    │  │ Section      │  │ Reranker      │  │
│  │ Sections    │  │ embeddings   │  │ MultiDoc      │  │
│  │ Entities    │  │ 768-dim      │  │ FewShot       │  │
│  │ Relations   │  │ cosine sim   │  │ CrossEncoder  │  │
│  └─────────────┘  └──────────────┘  └───────────────┘  │
│                                                          │
│  ┌──────────────────────────────────────────────────┐   │
│  │              Retrieval Pipeline                    │   │
│  │  query -> vector search -> confidence gate ->     │   │
│  │  rerank -> multi-doc expand -> synthesize         │   │
│  └──────────────────────────────────────────────────┘   │
│                                                          │
│  ┌──────────────────────────────────────────────────┐   │
│  │              Evaluation Framework                 │   │
│  │  questions.jsonl -> training | pack               │   │
│  │  -> judge scoring -> accuracy metrics             │   │
│  └──────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────┘

Core Components

Component Technology Role
Graph Database LadybugDB (embedded) Stores articles, sections, entities, relationships as graph nodes and edges
Vector Embeddings BAAI/bge-base-en-v1.5 (768-dim) Enables semantic search over section content via HNSW index
Synthesis Claude (Opus) Generates natural language answers from retrieved context
Query Expansion Claude (Haiku) Generates alternative phrasings for multi-query retrieval
Enhancement Modules Python classes Confidence gating, cross-encoder reranking, graph reranking, multi-doc synthesis, few-shot examples, content quality scoring

Next Steps

  • Quick Start -- Build and query your first pack in 5 minutes
  • Tutorial -- Full lifecycle walkthrough from domain selection to deployment
  • How Packs Work -- Deep dive into the content and retrieval pipelines