Skip to content

Kuzu Code Graph Schema

Overview

The Kuzu Code Graph Schema extends the 5-type memory system with code structure modeling, enabling memory-code linking for intelligent codebase understanding. This schema adds 3 code node types, 7 code relationship types, and 10 memory-code link types to the existing Kuzu backend.

Key Capabilities:

  • Map code structure (files, classes, functions)
  • Track code relationships (inheritance, calls, imports)
  • Link memories to code artifacts
  • Query code-memory connections
  • Enable blarify integration

Performance: Schema initialization <1000ms, idempotent operation.

Quick Start

from amplihack.memory.backends import KuzuBackend

# Initialize backend (creates memory + code schema)
backend = KuzuBackend()
backend.initialize()

# Query code graph
result = backend.connection.execute("""
    MATCH (f:Function)-[:DEFINED_IN]->(file:CodeFile)
    WHERE file.file_path = 'src/main.py'
    RETURN f.function_name, f.line_start, f.docstring
""")

# Link memory to code
backend.connection.execute("""
    MATCH (m:SemanticMemory {memory_id: $memory_id}),
          (f:Function {function_id: $function_id})
    CREATE (m)-[:RELATES_TO_FUNCTION_SEMANTIC]->(f)
""", {"memory_id": mem_id, "function_id": func_id})

Contents

Node Types

CodeFile

Represents a source code file with metadata.

CREATE NODE TABLE CodeFile(
    file_id STRING PRIMARY KEY,
    file_path STRING,
    language STRING,
    size_bytes INT64,
    line_count INT64,
    last_modified TIMESTAMP,
    git_hash STRING,
    module_name STRING,
    is_test BOOL,
    metadata STRING
)

Properties:

  • file_id: Unique identifier (hash of file_path)
  • file_path: Absolute or relative path to file
  • language: Programming language (python, typescript, etc.)
  • size_bytes: File size in bytes
  • line_count: Total lines of code
  • last_modified: Last modification timestamp
  • git_hash: Git commit hash when file was indexed
  • module_name: Python module or package name
  • is_test: Whether file contains tests
  • metadata: JSON string with additional properties

Example:

backend.connection.execute("""
    CREATE (f:CodeFile {
        file_id: $file_id,
        file_path: 'src/amplihack/memory/backends/kuzu_backend.py',
        language: 'python',
        size_bytes: 15234,
        line_count: 450,
        last_modified: $timestamp,
        git_hash: 'abc123def',
        module_name: 'amplihack.memory.backends.kuzu_backend',
        is_test: false,
        metadata: '{}'
    })
""", {"file_id": file_id, "timestamp": datetime.now()})

Class

Represents a class or interface definition.

CREATE NODE TABLE Class(
    class_id STRING PRIMARY KEY,
    class_name STRING,
    fully_qualified_name STRING,
    line_start INT64,
    line_end INT64,
    docstring STRING,
    is_abstract BOOL,
    is_interface BOOL,
    access_modifier STRING,
    decorators STRING,
    metadata STRING
)

Properties:

  • class_id: Unique identifier (hash of fully_qualified_name)
  • class_name: Simple class name
  • fully_qualified_name: Full module path + class name
  • line_start: Starting line number in file
  • line_end: Ending line number in file
  • docstring: Class documentation string
  • is_abstract: Whether class is abstract
  • is_interface: Whether class is an interface
  • access_modifier: public, private, protected
  • decorators: JSON array of decorator names
  • metadata: JSON string with additional properties

Example:

backend.connection.execute("""
    CREATE (c:Class {
        class_id: $class_id,
        class_name: 'KuzuBackend',
        fully_qualified_name: 'amplihack.memory.backends.kuzu_backend.KuzuBackend',
        line_start: 40,
        line_end: 850,
        docstring: 'Kùzu graph database backend.',
        is_abstract: false,
        is_interface: false,
        access_modifier: 'public',
        decorators: '[]',
        metadata: '{}'
    })
""", {"class_id": class_id})

Function

Represents a function or method definition.

CREATE NODE TABLE Function(
    function_id STRING PRIMARY KEY,
    function_name STRING,
    fully_qualified_name STRING,
    line_start INT64,
    line_end INT64,
    docstring STRING,
    signature STRING,
    return_type STRING,
    is_async BOOL,
    is_method BOOL,
    is_static BOOL,
    access_modifier STRING,
    decorators STRING,
    complexity_score DOUBLE,
    metadata STRING
)

Properties:

  • function_id: Unique identifier (hash of fully_qualified_name)
  • function_name: Simple function/method name
  • fully_qualified_name: Full module path + class + function
  • line_start: Starting line number in file
  • line_end: Ending line number in file
  • docstring: Function documentation string
  • signature: Complete function signature with parameters
  • return_type: Return type annotation (if available)
  • is_async: Whether function is async/await
  • is_method: Whether function is a class method
  • is_static: Whether method is static
  • access_modifier: public, private, protected
  • decorators: JSON array of decorator names
  • complexity_score: Cyclomatic complexity (0.0-100.0)
  • metadata: JSON string with additional properties

Example:

backend.connection.execute("""
    CREATE (f:Function {
        function_id: $function_id,
        function_name: 'store_memory',
        fully_qualified_name: 'amplihack.memory.backends.kuzu_backend.KuzuBackend.store_memory',
        line_start: 325,
        line_end: 450,
        docstring: 'Store a memory entry in appropriate node type.',
        signature: 'def store_memory(self, memory: MemoryEntry) -> bool',
        return_type: 'bool',
        is_async: false,
        is_method: true,
        is_static: false,
        access_modifier: 'public',
        decorators: '[]',
        complexity_score: 12.5,
        metadata: '{}'
    })
""", {"function_id": function_id})

Code Relationships

DEFINED_IN

Links classes and functions to their containing file.

CREATE REL TABLE DEFINED_IN(
    FROM Class TO CodeFile,
    line_offset INT64
)

CREATE REL TABLE DEFINED_IN(
    FROM Function TO CodeFile,
    line_offset INT64
)

Properties:

  • line_offset: Line number where definition starts in file

Example:

MATCH (c:Class {class_id: $class_id}),
      (f:CodeFile {file_id: $file_id})
CREATE (c)-[:DEFINED_IN {line_offset: 40}]->(f)

METHOD_OF

Links methods to their containing class.

CREATE REL TABLE METHOD_OF(
    FROM Function TO Class,
    is_constructor BOOL,
    is_property BOOL
)

Properties:

  • is_constructor: Whether method is __init__ or constructor
  • is_property: Whether method is a property getter/setter

Example:

MATCH (m:Function {function_name: 'store_memory'}),
      (c:Class {class_name: 'KuzuBackend'})
CREATE (m)-[:METHOD_OF {is_constructor: false, is_property: false}]->(c)

CALLS

Tracks function call relationships.

CREATE REL TABLE CALLS(
    FROM Function TO Function,
    call_count INT64,
    line_numbers STRING
)

Properties:

  • call_count: Number of times function is called
  • line_numbers: JSON array of line numbers where calls occur

Example:

MATCH (caller:Function {function_id: $caller_id}),
      (callee:Function {function_id: $callee_id})
CREATE (caller)-[:CALLS {call_count: 3, line_numbers: '[145, 210, 389]'}]->(callee)

INHERITS

Links child classes to parent classes.

CREATE REL TABLE INHERITS(
    FROM Class TO Class,
    inheritance_order INT64
)

Properties:

  • inheritance_order: Position in inheritance list (0 for first parent)

Example:

MATCH (child:Class {class_name: 'KuzuBackend'}),
      (parent:Class {class_name: 'MemoryBackend'})
CREATE (child)-[:INHERITS {inheritance_order: 0}]->(parent)

IMPORTS

Tracks file import dependencies.

CREATE REL TABLE IMPORTS(
    FROM CodeFile TO CodeFile,
    import_type STRING,
    imported_symbols STRING
)

Properties:

  • import_type: 'module', 'from_import', 'relative'
  • imported_symbols: JSON array of imported names

Example:

MATCH (importer:CodeFile {file_path: 'src/main.py'}),
      (imported:CodeFile {file_path: 'src/utils.py'})
CREATE (importer)-[:IMPORTS {
    import_type: 'from_import',
    imported_symbols: '["helper", "formatter"]'
}]->(imported)

REFERENCES

Links functions to classes they reference (not inherit).

CREATE REL TABLE REFERENCES(
    FROM Function TO Class,
    reference_type STRING,
    line_numbers STRING
)

Properties:

  • reference_type: 'instantiation', 'type_annotation', 'usage'
  • line_numbers: JSON array of line numbers where references occur

Example:

MATCH (f:Function {function_name: 'create_backend'}),
      (c:Class {class_name: 'MemoryEntry'})
CREATE (f)-[:REFERENCES {
    reference_type: 'type_annotation',
    line_numbers: '[12]'
}]->(c)

CONTAINS

Represents file containment (for nested modules).

CREATE REL TABLE CONTAINS(
    FROM CodeFile TO CodeFile,
    relationship_type STRING
)

Properties:

  • relationship_type: 'package', 'submodule'

Example:

MATCH (pkg:CodeFile {file_path: 'src/amplihack/__init__.py'}),
      (module:CodeFile {file_path: 'src/amplihack/memory/__init__.py'})
CREATE (pkg)-[:CONTAINS {relationship_type: 'package'}]->(module)

Connect the 5 memory types to code artifacts for context-aware memory retrieval.

CREATE REL TABLE RELATES_TO_FILE_EPISODIC(
    FROM EpisodicMemory TO CodeFile,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FILE_SEMANTIC(
    FROM SemanticMemory TO CodeFile,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FILE_PROCEDURAL(
    FROM ProceduralMemory TO CodeFile,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FILE_PROSPECTIVE(
    FROM ProspectiveMemory TO CodeFile,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FILE_WORKING(
    FROM WorkingMemory TO CodeFile,
    relevance_score DOUBLE,
    context STRING
)

Properties:

  • relevance_score: 0.0-1.0 indicating strength of relationship
  • context: Why memory relates to this file

Example:

MATCH (m:SemanticMemory {concept: 'kuzu_backend_refactoring'}),
      (f:CodeFile {file_path: 'src/amplihack/memory/backends/kuzu_backend.py'})
CREATE (m)-[:RELATES_TO_FILE_SEMANTIC {
    relevance_score: 0.95,
    context: 'Design decisions for 5-type memory schema'
}]->(f)
CREATE REL TABLE RELATES_TO_FUNCTION_EPISODIC(
    FROM EpisodicMemory TO Function,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FUNCTION_SEMANTIC(
    FROM SemanticMemory TO Function,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FUNCTION_PROCEDURAL(
    FROM ProceduralMemory TO Function,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FUNCTION_PROSPECTIVE(
    FROM ProspectiveMemory TO Function,
    relevance_score DOUBLE,
    context STRING
)

CREATE REL TABLE RELATES_TO_FUNCTION_WORKING(
    FROM WorkingMemory TO Function,
    relevance_score DOUBLE,
    context STRING
)

Properties:

  • relevance_score: 0.0-1.0 indicating strength of relationship
  • context: Why memory relates to this function

Example:

MATCH (m:ProceduralMemory {procedure_name: 'debugging_kuzu_queries'}),
      (f:Function {function_name: 'retrieve_memories'})
CREATE (m)-[:RELATES_TO_FUNCTION_PROCEDURAL {
    relevance_score: 0.85,
    context: 'Learned debugging technique while fixing query performance'
}]->(f)

Usage Examples

Query 1: Find All Functions in a File

MATCH (f:Function)-[:DEFINED_IN]->(file:CodeFile)
WHERE file.file_path = 'src/amplihack/memory/backends/kuzu_backend.py'
RETURN f.function_name, f.line_start, f.complexity_score
ORDER BY f.line_start

Output:

function_name       | line_start | complexity_score
--------------------|------------|------------------
__init__            | 49         | 2.0
initialize          | 80         | 15.5
store_memory        | 325        | 12.5
retrieve_memories   | 500        | 18.0

Query 2: Find Class Hierarchy

MATCH path = (child:Class)-[:INHERITS*1..3]->(ancestor:Class)
WHERE child.class_name = 'KuzuBackend'
RETURN child.class_name, ancestor.class_name, length(path) AS depth
ORDER BY depth

Output:

child_name  | ancestor_name  | depth
------------|----------------|------
KuzuBackend | MemoryBackend  | 1
KuzuBackend | BaseBackend    | 2

Query 3: Find Function Call Graph

MATCH (caller:Function)-[c:CALLS]->(callee:Function)
WHERE caller.function_name = 'store_memory'
RETURN callee.function_name, c.call_count, c.line_numbers
ORDER BY c.call_count DESC

Output:

callee_name          | call_count | line_numbers
---------------------|------------|-------------
_create_session_node | 5          | [390, 420, ...]
_validate_memory     | 1          | [330]
MATCH (m:SemanticMemory)-[r:RELATES_TO_FILE_SEMANTIC]->(f:CodeFile)
WHERE f.file_path CONTAINS 'kuzu_backend'
RETURN m.concept, m.content, r.relevance_score, r.context
ORDER BY r.relevance_score DESC
LIMIT 5

Output:

concept                    | content                 | relevance | context
---------------------------|-------------------------|-----------|------------------
kuzu_performance_tuning    | Use parameterized...   | 0.95      | Query optimization
5type_memory_migration     | Migration from flat... | 0.90      | Schema design
graph_traversal_patterns   | Cypher patterns for... | 0.85      | Query examples

Query 5: Find Complex Functions

MATCH (f:Function)-[:DEFINED_IN]->(file:CodeFile)
WHERE f.complexity_score > 15.0
  AND file.language = 'python'
RETURN f.fully_qualified_name, f.complexity_score, f.line_start, f.line_end
ORDER BY f.complexity_score DESC
LIMIT 10

Output:

fully_qualified_name                                    | complexity | line_start | line_end
-------------------------------------------------------|------------|------------|----------
amplihack.memory.backends.kuzu_backend.retrieve_memories | 18.0       | 500        | 650
amplihack.memory.backends.kuzu_backend.initialize        | 15.5       | 80         | 320

Query 6: Find Memories for Active Work

// Get all memories linked to functions in current file
MATCH (f:Function)-[:DEFINED_IN]->(file:CodeFile)
WHERE file.file_path = $current_file
OPTIONAL MATCH (m)-[r:RELATES_TO_FUNCTION_SEMANTIC|RELATES_TO_FUNCTION_PROCEDURAL]->(f)
RETURN f.function_name,
       collect({memory: m, relevance: r.relevance_score}) AS related_memories
ORDER BY f.line_start

Query 7: Find Import Dependencies

MATCH (f:CodeFile)-[i:IMPORTS]->(dep:CodeFile)
WHERE f.file_path = 'src/amplihack/memory/backends/kuzu_backend.py'
RETURN dep.file_path, i.import_type, i.imported_symbols

Output:

file_path                      | import_type  | imported_symbols
-------------------------------|--------------|---------------------------
src/amplihack/memory/models.py | from_import  | ["MemoryEntry", "MemoryQuery"]
src/amplihack/memory/base.py   | from_import  | ["BackendCapabilities"]

Integration with Memory System

The code graph schema integrates seamlessly with the existing 5-type memory system.

Automatic Code-Memory Linking

When memories are stored, the system can automatically link them to relevant code:

from amplihack.memory import MemoryService
from amplihack.memory.backends import KuzuBackend

backend = KuzuBackend()
backend.initialize()
service = MemoryService(backend)

# Store memory with code context
memory = service.store_memory(
    memory_type=MemoryType.SEMANTIC,
    content="Discovered performance issue in retrieve_memories function",
    metadata={
        "code_file": "src/amplihack/memory/backends/kuzu_backend.py",
        "function_name": "retrieve_memories",
        "line_number": 520
    }
)

# System automatically creates RELATES_TO_FUNCTION_SEMANTIC relationship

Context-Aware Memory Retrieval

Retrieve memories relevant to current code context:

# Working on kuzu_backend.py, need relevant memories
memories = backend.connection.execute("""
    MATCH (f:CodeFile {file_path: $file_path})
    OPTIONAL MATCH (m:SemanticMemory)-[r:RELATES_TO_FILE_SEMANTIC]->(f)
    WHERE r.relevance_score > 0.7
    RETURN m.concept, m.content, r.relevance_score
    ORDER BY r.relevance_score DESC
""", {"file_path": "src/amplihack/memory/backends/kuzu_backend.py"})

Code Structure Navigation

Navigate code structure with memory awareness:

# Find all classes and their methods with related memories
result = backend.connection.execute("""
    MATCH (c:Class)<-[:METHOD_OF]-(m:Function)
    OPTIONAL MATCH (mem:SemanticMemory)-[:RELATES_TO_FUNCTION_SEMANTIC]->(m)
    RETURN c.class_name,
           collect({method: m.function_name, memories: count(mem)}) AS methods
""")

Performance Characteristics

Schema Initialization

  • Time: <1000ms for all 20 tables (3 node types + 7 code relationships + 10 memory-code links)
  • Idempotent: Safe to call initialize() multiple times
  • Database size: ~50KB overhead for empty schema

Benchmark:

import time
backend = KuzuBackend()
start = time.time()
backend.initialize()
elapsed = time.time() - start
print(f"Schema initialization: {elapsed*1000:.1f}ms")
# Output: Schema initialization: 850.2ms

Query Performance

  • Simple traversal (single hop): <10ms
  • Complex traversal (3+ hops): <100ms
  • Memory-code linking: <50ms per relationship
  • File indexing: ~500ms per 1000 LOC

Benchmark:

# Find all functions in a file
start = time.time()
result = backend.connection.execute("""
    MATCH (f:Function)-[:DEFINED_IN]->(file:CodeFile)
    WHERE file.file_path = $path
    RETURN f
""", {"path": "src/amplihack/memory/backends/kuzu_backend.py"})
elapsed = time.time() - start
print(f"Query time: {elapsed*1000:.1f}ms")
# Output: Query time: 8.5ms

Memory Overhead

  • CodeFile node: ~200 bytes
  • Class node: ~300 bytes
  • Function node: ~400 bytes
  • Relationship: ~100 bytes

Typical codebase (10,000 LOC):

  • 50 files × 200 bytes = 10KB
  • 100 classes × 300 bytes = 30KB
  • 500 functions × 400 bytes = 200KB
  • 2000 relationships × 100 bytes = 200KB
  • Total: ~440KB

Schema Reference

Complete Node Count

  • Memory nodes: 5 types (Episodic, Semantic, Procedural, Prospective, Working)
  • Code nodes: 3 types (CodeFile, Class, Function)
  • Infrastructure: 2 types (Session, Agent)
  • Total: 10 node types

Complete Relationship Count

  • Memory relationships: 11 types
  • Code relationships: 7 types
  • Memory-code links: 10 types (5 to files + 5 to functions)
  • Total: 28 relationship types

Schema Evolution

The schema follows semantic versioning:

  • v1.0: Initial 5-type memory system
  • v1.1: Added code graph schema (this document)
  • Future: Vector embeddings, code change tracking

Migration Path

Existing memory databases automatically upgrade:

backend = KuzuBackend()  # Existing database
backend.initialize()      # Adds code schema, preserves memory data

Migration guarantees:

  • Zero downtime
  • No data loss
  • Backward compatible queries
  • Forward compatible storage

Next Steps

  1. Week 2: Blarify import integration
  2. Import blarify knowledge into code graph
  3. Map blarify entities to code nodes
  4. Preserve blarify relationships

  5. Week 3: Memory-code linking

  6. Automatic linking based on context
  7. Link existing memories to code
  8. Query optimization

  9. Future enhancements:

  10. Vector embeddings for semantic code search
  11. Code change tracking (git history)
  12. Test coverage integration
  13. Documentation linking

Implementation: src/amplihack/memory/backends/kuzu_backend.py Schema Version: 1.1 Status: Complete and deployed Performance: <1000ms initialization, <100ms queries