LadybugDB Code Graph¶

The LadybugDB (formerly Kuzu) code-graph is amplihack-rs's native, zero-Python store for structural information about a project's source code — files, classes, functions, and the relationships between them.

Contents¶

What it is
Why LadybugDB
Schema
Data ingestion pipeline
blarify: consumption vs. generation
SCIP: the primary ingestion path
Why scip-python is not Python delegation
Known gap: native blarify generation
On-disk layout
Security model
Related

What it is¶

The LadybugDB code-graph answers structural questions about a codebase without reading source files at query time:

"How many functions are in this project?"
"Which functions call run_tui?"
"What classes are defined in fleet.rs?"
"What imports does code_graph.rs have?"

It is a persistent property graph, updated by running amplihack index-scip (or amplihack index-code for blarify JSON imports) and queried via amplihack query-code.

Historical naming: The graph database engine was previously known as "Kuzu" and has been rebranded to "LadybugDB". The lbug crate (formerly kuzu) provides the Rust FFI bindings. CLI flags like --kuzu-path and env vars like AMPLIHACK_KUZU_DB_PATH remain as backward-compatible aliases.

Why LadybugDB¶

LadybugDB (formerly Kuzu) is an embeddable property graph database with a C++ core. The lbug Rust crate exposes it through a native C++ FFI binding (cxx-build, pinned to =1.0.138 per the version contract).

LadybugDB has no runtime dependency on Python or any interpreter. The FFI boundary is compile-time — cargo build links the LadybugDB C++ library into the amplihack binary. There is no subprocess launched to query the graph.

Schema¶

The graph stores 3 node types and 7 relationship types:

Node tables¶

Table	Primary key	Key fields
`CodeFile`	`file_id` (SHA-256 of path)	`file_path`, `language`, `size_bytes`, `last_modified`
`CodeClass`	`class_id`	`class_name`, `fully_qualified_name`, `file_path`, `line_number`, `is_abstract`
`CodeFunction`	`function_id`	`function_name`, `fully_qualified_name`, `signature`, `file_path`, `line_number`, `is_async`, `cyclomatic_complexity`

All node tables carry a metadata JSON string column for extension fields and a created_at timestamp.

Relationship tables¶

Relationship	From → To	Key fields
`DEFINED_IN`	`CodeFunction → CodeFile`	`line_number`, `end_line`
`CLASS_DEFINED_IN`	`CodeClass → CodeFile`	`line_number`
`METHOD_OF`	`CodeFunction → CodeClass`	`method_type`, `visibility`
`CALLS`	`CodeFunction → CodeFunction`	`call_count`, `context`
`INHERITS`	`CodeClass → CodeClass`	`inheritance_order`, `inheritance_type`
`REFERENCES_CLASS`	`CodeFunction → CodeClass`	`reference_type`, `context`
`IMPORTS`	`CodeFile → CodeFile`	`import_type`, `alias`

Schema creation is idempotent — all CREATE statements use IF NOT EXISTS. Running index-scip on an already-indexed project upserts records in place without duplicating nodes.

Data ingestion pipeline¶

Two ingestion paths populate the graph:

Source code
    │
    ├── path A: SCIP pipeline (primary)
    │       │
    │       ▼
    │   SCIP indexer binary (scip-python, rust-analyzer, scip-go, …)
    │       │  subprocess via std::process::Command — no interpreter
    │       ▼
    │   index.scip  (protobuf binary)
    │       │
    │       ▼
    │   import_scip_file()  — prost decode + SCIP-to-BlarifyOutput conversion
    │       │
    │       ▼
    │   LadybugDB graph  ◄──────────────────────────────────┐
    │                                                   │
    └── path B: blarify JSON import                     │
            │                                           │
            ▼                                           │
        blarify.json  (produced externally)             │
            │                                           │
            ▼                                           │
        import_blarify_json()  — serde_json parse       │
            │                                           │
            └───────────────────────────────────────────┘

Path A (index-scip) is the recommended path for new projects. Path B (index-code) is for environments where blarify.json is already available (e.g. produced by a CI job or another tool).

blarify: consumption vs. generation¶

amplihack-rs consumes blarify JSON but does not generate it.

blarify is a Python tree-sitter-based tool with parsers for 20+ languages that produces a blarify.json call-graph export. A native Rust port of the blarify generator is out of scope for this project.

What amplihack-rs does:

Defines the BlarifyOutput deserialization schema in Rust (serde)
Imports any conforming blarify.json into LadybugDB via import_blarify_json()
Never invokes python blarify or python -m blarify as a subprocess

If blarify.json is absent, index-code logs a WARN and exits cleanly with zero counts. It does not abort the process or fall back to a Python subprocess.

The live path for code-graph indexing in amplihack-rs uses SCIP (path A above), not blarify generation.

SCIP: the primary ingestion path¶

SCIP (SCIP Code Intelligence Protocol) is a protobuf-based format for precise code intelligence — symbols, occurrences, and relationships across a codebase.

amplihack-rs uses prost to decode SCIP protobuf files and then converts them to the internal BlarifyOutput structure for import into LadybugDB.

The SCIP indexer binaries are external native tools:

Indexer	Language	Type
`scip-python`	Python source	Go binary
`scip-typescript`	TypeScript/JavaScript	Node binary
`scip-go`	Go	Go binary
`rust-analyzer`	Rust	Rust binary
`scip-dotnet`	C#	.NET binary
`scip-clang`	C/C++	Clang-based binary

These are invoked via std::process::Command with arguments as discrete Vec<String> elements — no shell string interpolation, no interpreter.

Why scip-python is not Python delegation¶

scip-python is distributed by Sourcegraph as a compiled Go binary. The name is misleading: it indexes Python source code but is itself a Go executable. Installing it via pip install scip-python places a Go binary on PATH.

Invoking scip-python index from Rust is functionally identical to invoking scip-go or rust-analyzer scip. It does not launch a Python interpreter. This satisfies the no-Python-subprocess constraint.

The constraint that is enforced: no python3 -c ..., no python script.py, no PyO3 embedding. Language-specific SCIP indexer binaries are valid external tool use.

Known gap: native blarify generation¶

Scope boundary for issue #77:

Generating blarify.json natively in Rust (replacing the Python tree-sitter blarify tool) is not part of this project's scope. Doing so would require porting 20+ tree-sitter language parsers — a multi-month effort tracked separately as issue #78.

The current position:

Capability	Status
Consume `blarify.json` from any source	✅ Implemented
Index via SCIP (no blarify needed)	✅ Implemented
No `python blarify` subprocess on the live path	✅ Verified by probe
Invoke `blarify` binary (if installed) as external tool	Acceptable, not yet needed
Generate blarify JSON natively in Rust	⏳ Issue #78

On-disk layout¶

After running amplihack index-scip in a project:

<project>/
└── .amplihack/
    ├── graph_db/         ← Graph database directory (0700)
    │   ├── data.kz       ← graph data (0600)
    │   └── ...
    └── indexes/
        ├── python.scip   ← SCIP artifact for Python
        ├── rust.scip     ← SCIP artifact for Rust
        └── ...

The graph_db directory and its contents are created with restrictive permissions (0700 / 0600) to prevent other users on a shared system from reading graph data that may include sensitive symbol names or docstrings.

Security model¶

Property	Implementation
No interpreter subprocess	All SCIP indexers are binary executables; `python3` is never launched
Parameterized queries	All LadybugDB Cypher statements use parameter binding; no string interpolation
Path canonicalization	`--project-path` and `--db-path` are canonicalized; symlinks emit `WARN`
Blocked prefixes	`/proc`, `/sys`, `/dev` are rejected immediately
DB file permissions	`0600` file / `0700` directory enforced after first open (Unix only)
JSON size guard	`blarify.json` ≥ 500 MB is rejected before parsing
Argument injection prevention	External tool commands use `Vec<String>` argument lists

amplihack index-scip and index-code reference
amplihack query-code reference
Index a project end-to-end
The cxx/cxx-build Version Contract — why LadybugDB requires a pinned cxx version