Kuzu Code Graph Schema¶

Overview¶

The Kuzu Code Graph Schema extends the 5-type memory system with code structure modeling, enabling memory-code linking for intelligent codebase understanding. This schema adds 3 code node types, 7 code relationship types, and 10 memory-code link types to the existing Kuzu backend.

Key Capabilities:

Map code structure (files, classes, functions)
Track code relationships (inheritance, calls, imports)
Link memories to code artifacts
Query code-memory connections
Enable blarify integration

Performance: Schema initialization <1000ms, idempotent operation.

Quick Start¶

from amplihack.memory.backends import KuzuBackend

# Initialize backend (creates memory + code schema)
backend = KuzuBackend()
backend.initialize()

# Query code graph
result = backend.connection.execute("""
    MATCH (f:Function)-[:DEFINED_IN]->(file:CodeFile)
    WHERE file.file_path = 'src/main.py'
    RETURN f.function_name, f.line_start, f.docstring
""")

# Link memory to code
backend.connection.execute("""
    MATCH (m:SemanticMemory {memory_id: $memory_id}),
          (f:Function {function_id: $function_id})
    CREATE (m)-[:RELATES_TO_FUNCTION_SEMANTIC]->(f)
""", {"memory_id": mem_id, "function_id": func_id})

Node Types¶

CodeFile¶

Represents a source code file with metadata.

CREATE NODE TABLE CodeFile(
    file_id STRING PRIMARY KEY,
    file_path STRING,
    language STRING,
    size_bytes INT64,
    line_count INT64,
    last_modified TIMESTAMP,
    git_hash STRING,
    module_name STRING,
    is_test BOOL,
    metadata STRING
)

Properties:

file_id: Unique identifier (hash of file_path)
file_path: Absolute or relative path to file
language: Programming language (python, typescript, etc.)
size_bytes: File size in bytes
line_count: Total lines of code
last_modified: Last modification timestamp
git_hash: Git commit hash when file was indexed
module_name: Python module or package name
is_test: Whether file contains tests
metadata: JSON string with additional properties

Example:

backend.connection.execute("""
    CREATE (f:CodeFile {
        file_id: $file_id,
        file_path: 'src/amplihack/memory/backends/kuzu_backend.py',
        language: 'python',
        size_bytes: 15234,
        line_count: 450,
        last_modified: $timestamp,
        git_hash: 'abc123def',
        module_name: 'amplihack.memory.backends.kuzu_backend',
        is_test: false,
        metadata: '{}'
    })
""", {"file_id": file_id, "timestamp": datetime.now()})

Class¶

Represents a class or interface definition.

CREATE NODE TABLE Class(
    class_id STRING PRIMARY KEY,
    class_name STRING,
    fully_qualified_name STRING,
    line_start INT64,
    line_end INT64,
    docstring STRING,
    is_abstract BOOL,
    is_interface BOOL,
    access_modifier STRING,
    decorators STRING,
    metadata STRING
)

Properties:

class_id: Unique identifier (hash of fully_qualified_name)
class_name: Simple class name
fully_qualified_name: Full module path + class name
line_start: Starting line number in file
line_end: Ending line number in file
docstring: Class documentation string
is_abstract: Whether class is abstract
is_interface: Whether class is an interface
access_modifier: public, private, protected
decorators: JSON array of decorator names
metadata: JSON string with additional properties

Example:

backend.connection.execute("""
    CREATE (c:Class {
        class_id: $class_id,
        class_name: 'KuzuBackend',
        fully_qualified_name: 'amplihack.memory.backends.kuzu_backend.KuzuBackend',
        line_start: 40,
        line_end: 850,
        docstring: 'Kùzu graph database backend.',
        is_abstract: false,
        is_interface: false,
        access_modifier: 'public',
        decorators: '[]',
        metadata: '{}'
    })
""", {"class_id": class_id})

Function¶

Represents a function or method definition.

CREATE NODE TABLE Function(
    function_id STRING PRIMARY KEY,
    function_name STRING,
    fully_qualified_name STRING,
    line_start INT64,
    line_end INT64,
    docstring STRING,
    signature STRING,
    return_type STRING,
    is_async BOOL,
    is_method BOOL,
    is_static BOOL,
    access_modifier STRING,
    decorators STRING,
    complexity_score DOUBLE,
    metadata STRING
)

Properties:

function_id: Unique identifier (hash of fully_qualified_name)
function_name: Simple function/method name
fully_qualified_name: Full module path + class + function
line_start: Starting line number in file
line_end: Ending line number in file
docstring: Function documentation string
signature: Complete function signature with parameters
return_type: Return type annotation (if available)
is_async: Whether function is async/await
is_method: Whether function is a class method
is_static: Whether method is static
access_modifier: public, private, protected
decorators: JSON array of decorator names
complexity_score: Cyclomatic complexity (0.0-100.0)
metadata: JSON string with additional properties

Example:

backend.connection.execute("""
    CREATE (f:Function {
        function_id: $function_id,
        function_name: 'store_memory',
        fully_qualified_name: 'amplihack.memory.backends.kuzu_backend.KuzuBackend.store_memory',
        line_start: 325,
        line_end: 450,
        docstring: 'Store a memory entry in appropriate node type.',
        signature: 'def store_memory(self, memory: MemoryEntry) -> bool',
        return_type: 'bool',
        is_async: false,
        is_method: true,
        is_static: false,
        access_modifier: 'public',
        decorators: '[]',
        complexity_score: 12.5,
        metadata: '{}'
    })
""", {"function_id": function_id})

Code Relationships¶

DEFINED_IN¶

Links classes and functions to their containing file.

CREATE REL TABLE DEFINED_IN(
    FROM Class TO CodeFile,
    line_offset INT64
)

CREATE REL TABLE DEFINED_IN(
    FROM Function TO CodeFile,
    line_offset INT64
)

Properties:

line_offset: Line number where definition starts in file

Example:

MATCH (c:Class {class_id: $class_id}),
      (f:CodeFile {file_id: $file_id})
CREATE (c)-[:DEFINED_IN {line_offset: 40}]->(f)

METHOD_OF¶

Links methods to their containing class.

CREATE REL TABLE METHOD_OF(
    FROM Function TO Class,
    is_constructor BOOL,
    is_property BOOL
)

Properties:

is_constructor: Whether method is __init__ or constructor
is_property: Whether method is a property getter/setter

Example:

MATCH (m:Function {function_name: 'store_memory'}),
      (c:Class {class_name: 'KuzuBackend'})
CREATE (m)-[:METHOD_OF {is_constructor: false, is_property: false}]->(c)

CALLS¶

Tracks function call relationships.

CREATE REL TABLE CALLS(
    FROM Function TO Function,
    call_count INT64,
    line_numbers STRING
)

Properties:

call_count: Number of times function is called
line_numbers: JSON array of line numbers where calls occur

Example:

MATCH (caller:Function {function_id: $caller_id}),
      (callee:Function {function_id: $callee_id})
CREATE (caller)-[:CALLS {call_count: 3, line_numbers: '[145, 210, 389]'}]->(callee)

INHERITS¶

Links child classes to parent classes.

CREATE REL TABLE INHERITS(
    FROM Class TO Class,
    inheritance_order INT64
)

Properties:

inheritance_order: Position in inheritance list (0 for first parent)

Example:

MATCH (child:Class {class_name: 'KuzuBackend'}),
      (parent:Class {class_name: 'MemoryBackend'})
CREATE (child)-[:INHERITS {inheritance_order: 0}]->(parent)

IMPORTS¶

Tracks file import dependencies.

CREATE REL TABLE IMPORTS(
    FROM CodeFile TO CodeFile,
    import_type STRING,
    imported_symbols STRING
)

Properties:

import_type: 'module', 'from_import', 'relative'
imported_symbols: JSON array of imported names

Example:

MATCH (importer:CodeFile {file_path: 'src/main.py'}),
      (imported:CodeFile {file_path: 'src/utils.py'})
CREATE (importer)-[:IMPORTS {
    import_type: 'from_import',
    imported_symbols: '["helper", "formatter"]'
}]->(imported)

REFERENCES¶

Links functions to classes they reference (not inherit).

CREATE REL TABLE REFERENCES(
    FROM Function TO Class,
    reference_type STRING,
    line_numbers STRING
)

Properties:

reference_type: 'instantiation', 'type_annotation', 'usage'
line_numbers: JSON array of line numbers where references occur

Example:

MATCH (f:Function {function_name: 'create_backend'}),
      (c:Class {class_name: 'MemoryEntry'})
CREATE (f)-[:REFERENCES {
    reference_type: 'type_annotation',
    line_numbers: '[12]'
}]->(c)

CONTAINS¶

Represents file containment (for nested modules).

CREATE REL TABLE CONTAINS(
    FROM CodeFile TO CodeFile,
    relationship_type STRING
)

Properties:

relationship_type: 'package', 'submodule'

Example:

MATCH (pkg:CodeFile {file_path: 'src/amplihack/__init__.py'}),
      (module:CodeFile {file_path: 'src/amplihack/memory/__init__.py'})
CREATE (pkg)-[:CONTAINS {relationship_type: 'package'}]->(module)

Memory-Code Links¶

Connect the 5 memory types to code artifacts for context-aware memory retrieval.

Memory → CodeFile Links¶

CREATE REL TABLE RELATES_TO_FILE_EPISODIC(
    FROM EpisodicMemory TO CodeFile,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FILE_SEMANTIC(
    FROM SemanticMemory TO CodeFile,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FILE_PROCEDURAL(
    FROM ProceduralMemory TO CodeFile,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FILE_PROSPECTIVE(
    FROM ProspectiveMemory TO CodeFile,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FILE_WORKING(
    FROM WorkingMemory TO CodeFile,
    relevance_score DOUBLE,
    context STRING
)

Properties:

relevance_score: 0.0-1.0 indicating strength of relationship
context: Why memory relates to this file

Example:

MATCH (m:SemanticMemory {concept: 'kuzu_backend_refactoring'}),
      (f:CodeFile {file_path: 'src/amplihack/memory/backends/kuzu_backend.py'})
CREATE (m)-[:RELATES_TO_FILE_SEMANTIC {
    relevance_score: 0.95,
    context: 'Design decisions for 5-type memory schema'
}]->(f)

Memory → Function Links¶

CREATE REL TABLE RELATES_TO_FUNCTION_EPISODIC(
    FROM EpisodicMemory TO Function,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FUNCTION_SEMANTIC(
    FROM SemanticMemory TO Function,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FUNCTION_PROCEDURAL(
    FROM ProceduralMemory TO Function,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FUNCTION_PROSPECTIVE(
    FROM ProspectiveMemory TO Function,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FUNCTION_WORKING(
    FROM WorkingMemory TO Function,
    relevance_score DOUBLE,
    context STRING
)

Properties:

relevance_score: 0.0-1.0 indicating strength of relationship
context: Why memory relates to this function

Example:

MATCH (m:ProceduralMemory {procedure_name: 'debugging_kuzu_queries'}),
      (f:Function {function_name: 'retrieve_memories'})
CREATE (m)-[:RELATES_TO_FUNCTION_PROCEDURAL {
    relevance_score: 0.85,
    context: 'Learned debugging technique while fixing query performance'
}]->(f)

Usage Examples¶

Query 1: Find All Functions in a File¶

MATCH (f:Function)-[:DEFINED_IN]->(file:CodeFile)
WHERE file.file_path = 'src/amplihack/memory/backends/kuzu_backend.py'
RETURN f.function_name, f.line_start, f.complexity_score
ORDER BY f.line_start

Output:

function_name       | line_start | complexity_score
--------------------|------------|------------------
__init__            | 49         | 2.0
initialize          | 80         | 15.5
store_memory        | 325        | 12.5
retrieve_memories   | 500        | 18.0

Query 2: Find Class Hierarchy¶

MATCH path = (child:Class)-[:INHERITS*1..3]->(ancestor:Class)
WHERE child.class_name = 'KuzuBackend'
RETURN child.class_name, ancestor.class_name, length(path) AS depth
ORDER BY depth

Output:

child_name  | ancestor_name  | depth
------------|----------------|------
KuzuBackend | MemoryBackend  | 1
KuzuBackend | BaseBackend    | 2

Query 3: Find Function Call Graph¶

MATCH (caller:Function)-[c:CALLS]->(callee:Function)
WHERE caller.function_name = 'store_memory'
RETURN callee.function_name, c.call_count, c.line_numbers
ORDER BY c.call_count DESC

Output:

callee_name          | call_count | line_numbers
---------------------|------------|-------------
_create_session_node | 5          | [390, 420, ...]
_validate_memory     | 1          | [330]

MATCH (m:SemanticMemory)-[r:RELATES_TO_FILE_SEMANTIC]->(f:CodeFile)
WHERE f.file_path CONTAINS 'kuzu_backend'
RETURN m.concept, m.content, r.relevance_score, r.context
ORDER BY r.relevance_score DESC
LIMIT 5

Output:

concept                    | content                 | relevance | context
---------------------------|-------------------------|-----------|------------------
kuzu_performance_tuning    | Use parameterized...   | 0.95      | Query optimization
5type_memory_migration     | Migration from flat... | 0.90      | Schema design
graph_traversal_patterns   | Cypher patterns for... | 0.85      | Query examples

Query 5: Find Complex Functions¶

MATCH (f:Function)-[:DEFINED_IN]->(file:CodeFile)
WHERE f.complexity_score > 15.0
  AND file.language = 'python'
RETURN f.fully_qualified_name, f.complexity_score, f.line_start, f.line_end
ORDER BY f.complexity_score DESC
LIMIT 10

Output:

fully_qualified_name                                    | complexity | line_start | line_end
-------------------------------------------------------|------------|------------|----------
amplihack.memory.backends.kuzu_backend.retrieve_memories | 18.0       | 500        | 650
amplihack.memory.backends.kuzu_backend.initialize        | 15.5       | 80         | 320

Query 6: Find Memories for Active Work¶

// Get all memories linked to functions in current file
MATCH (f:Function)-[:DEFINED_IN]->(file:CodeFile)
WHERE file.file_path = $current_file
OPTIONAL MATCH (m)-[r:RELATES_TO_FUNCTION_SEMANTIC|RELATES_TO_FUNCTION_PROCEDURAL]->(f)
RETURN f.function_name,
       collect({memory: m, relevance: r.relevance_score}) AS related_memories
ORDER BY f.line_start

Query 7: Find Import Dependencies¶

MATCH (f:CodeFile)-[i:IMPORTS]->(dep:CodeFile)
WHERE f.file_path = 'src/amplihack/memory/backends/kuzu_backend.py'
RETURN dep.file_path, i.import_type, i.imported_symbols

Output:

file_path                      | import_type  | imported_symbols
-------------------------------|--------------|---------------------------
src/amplihack/memory/models.py | from_import  | ["MemoryEntry", "MemoryQuery"]
src/amplihack/memory/base.py   | from_import  | ["BackendCapabilities"]

Integration with Memory System¶

The code graph schema integrates seamlessly with the existing 5-type memory system.

Automatic Code-Memory Linking¶

When memories are stored, the system can automatically link them to relevant code:

from amplihack.memory import MemoryService
from amplihack.memory.backends import KuzuBackend

backend = KuzuBackend()
backend.initialize()
service = MemoryService(backend)

# Store memory with code context
memory = service.store_memory(
    memory_type=MemoryType.SEMANTIC,
    content="Discovered performance issue in retrieve_memories function",
    metadata={
        "code_file": "src/amplihack/memory/backends/kuzu_backend.py",
        "function_name": "retrieve_memories",
        "line_number": 520
    }
)

# System automatically creates RELATES_TO_FUNCTION_SEMANTIC relationship

Context-Aware Memory Retrieval¶

Retrieve memories relevant to current code context:

# Working on kuzu_backend.py, need relevant memories
memories = backend.connection.execute("""
    MATCH (f:CodeFile {file_path: $file_path})
    OPTIONAL MATCH (m:SemanticMemory)-[r:RELATES_TO_FILE_SEMANTIC]->(f)
    WHERE r.relevance_score > 0.7
    RETURN m.concept, m.content, r.relevance_score
    ORDER BY r.relevance_score DESC
""", {"file_path": "src/amplihack/memory/backends/kuzu_backend.py"})

Navigate code structure with memory awareness:

# Find all classes and their methods with related memories
result = backend.connection.execute("""
    MATCH (c:Class)<-[:METHOD_OF]-(m:Function)
    OPTIONAL MATCH (mem:SemanticMemory)-[:RELATES_TO_FUNCTION_SEMANTIC]->(m)
    RETURN c.class_name,
           collect({method: m.function_name, memories: count(mem)}) AS methods
""")

Performance Characteristics¶

Schema Initialization¶

Time: <1000ms for all 20 tables (3 node types + 7 code relationships + 10 memory-code links)
Idempotent: Safe to call initialize() multiple times
Database size: ~50KB overhead for empty schema

Benchmark:

import time
backend = KuzuBackend()
start = time.time()
backend.initialize()
elapsed = time.time() - start
print(f"Schema initialization: {elapsed*1000:.1f}ms")
# Output: Schema initialization: 850.2ms

Query Performance¶

Simple traversal (single hop): <10ms
Complex traversal (3+ hops): <100ms
Memory-code linking: <50ms per relationship
File indexing: ~500ms per 1000 LOC

Benchmark:

# Find all functions in a file
start = time.time()
result = backend.connection.execute("""
    MATCH (f:Function)-[:DEFINED_IN]->(file:CodeFile)
    WHERE file.file_path = $path
    RETURN f
""", {"path": "src/amplihack/memory/backends/kuzu_backend.py"})
elapsed = time.time() - start
print(f"Query time: {elapsed*1000:.1f}ms")
# Output: Query time: 8.5ms

Memory Overhead¶

CodeFile node: ~200 bytes
Class node: ~300 bytes
Function node: ~400 bytes
Relationship: ~100 bytes

Typical codebase (10,000 LOC):

50 files × 200 bytes = 10KB
100 classes × 300 bytes = 30KB
500 functions × 400 bytes = 200KB
2000 relationships × 100 bytes = 200KB
Total: ~440KB

Schema Reference¶

Complete Node Count¶

Memory nodes: 5 types (Episodic, Semantic, Procedural, Prospective, Working)
Code nodes: 3 types (CodeFile, Class, Function)
Infrastructure: 2 types (Session, Agent)
Total: 10 node types

Complete Relationship Count¶

Memory relationships: 11 types
Code relationships: 7 types
Memory-code links: 10 types (5 to files + 5 to functions)
Total: 28 relationship types

Schema Evolution¶

The schema follows semantic versioning:

v1.0: Initial 5-type memory system
v1.1: Added code graph schema (this document)
Future: Vector embeddings, code change tracking

Migration Path¶

Existing memory databases automatically upgrade:

backend = KuzuBackend()  # Existing database
backend.initialize()      # Adds code schema, preserves memory data

Migration guarantees:

Zero downtime
No data loss
Backward compatible queries
Forward compatible storage

Next Steps¶

Week 2: Blarify import integration
Import blarify knowledge into code graph
Map blarify entities to code nodes
Preserve blarify relationships
Week 3: Memory-code linking
Automatic linking based on context
Link existing memories to code
Query optimization
Future enhancements:
Vector embeddings for semantic code search
Code change tracking (git history)
Test coverage integration
Documentation linking

5-Type Memory Schema - Core memory system
Blarify Integration - Week 2 migration plan
Memory Architecture - System design
Kuzu Backend API - Implementation

Implementation: src/amplihack/memory/backends/kuzu_backend.py Schema Version: 1.1 Status: Complete and deployed Performance: <1000ms initialization, <100ms queries

Kuzu Code Graph Schema¶

Overview¶

Quick Start¶

Contents¶

Node Types¶

CodeFile¶

Class¶

Function¶

Code Relationships¶

DEFINED_IN¶

METHOD_OF¶

CALLS¶

INHERITS¶

IMPORTS¶

REFERENCES¶

CONTAINS¶

Memory-Code Links¶

Memory → CodeFile Links¶

Memory → Function Links¶

Usage Examples¶

Query 1: Find All Functions in a File¶

Query 2: Find Class Hierarchy¶

Query 3: Find Function Call Graph¶

Query 4: Find Memories Related to Code¶

Query 5: Find Complex Functions¶

Query 6: Find Memories for Active Work¶

Query 7: Find Import Dependencies¶

Integration with Memory System¶

Automatic Code-Memory Linking¶

Context-Aware Memory Retrieval¶

Code Structure Navigation¶

Performance Characteristics¶

Schema Initialization¶

Query Performance¶

Memory Overhead¶

Schema Reference¶

Complete Node Count¶

Complete Relationship Count¶

Schema Evolution¶

Migration Path¶

Next Steps¶

Related Documentation¶