KG Agent Test Suite Reference¶
Reference for the unit and integration tests covering KnowledgeGraphAgent in tests/agent/.
Quick Start¶
# Run all agent tests (no database required)
pytest tests/agent/ -v
# Expected output (without data/wikigr_30k.db present):
# 44 passed, 27 skipped
All unit tests run offline — no LadybugDB database, no Anthropic API calls. The 27 skipped tests require an optional integration database (data/wikigr_30k.db) and are silently skipped when it is absent.
Test Files¶
test_kg_agent_core.py — Core method unit tests¶
Tests five KnowledgeGraphAgent methods using fully mocked LadybugDB connections and Claude API. No real database or network I/O.
Note: The module docstring at the top of this file still lists
_execute_queryand_execute_fallback_queryas targets — this is stale comment-only technical debt from a dead code cleanup. Those methods no longer exist in production; see Removed test targets below.
Test classes:
| Class | Method Under Test | Coverage |
|---|---|---|
TestSafeJsonLoads |
_safe_json_loads (module-level) |
Valid JSON, invalid JSON, dict passthrough, non-string types, empty string |
TestFetchSourceText |
_fetch_source_text |
Returns section text, empty titles, article content fallback, DB errors, truncation at 3000 chars, max_articles slicing |
TestBuildSynthesisContext |
_build_synthesis_context |
Sources/entities/facts inclusion, few-shot example injection, empty results |
TestHybridRetrieve |
_hybrid_retrieve |
Combined vector + keyword signals, vector failure fallback, all-DB-failure graceful return, keyword_weight affects ranking |
TestConfidenceGatedContextInjection |
query(), _synthesize_answer_minimal() |
Low similarity triggers confidence gate, high similarity runs normal pipeline, minimal synthesis calls Claude without KG context, API error returns fallback string, exact threshold boundary (0.5 is NOT below threshold), empty Claude response returns dedicated fallback string |
Fixture pattern:
from unittest.mock import MagicMock, patch
from wikigr.agent.kg_agent import KnowledgeGraphAgent, _safe_json_loads
def _make_agent() -> KnowledgeGraphAgent:
"""Build a KnowledgeGraphAgent with fully mocked internals."""
agent = KnowledgeGraphAgent.__new__(KnowledgeGraphAgent)
agent.db = None
agent.conn = MagicMock()
agent.claude = MagicMock()
# ... other required attrs
return agent
The _make_agent() helper bypasses __init__ using __new__, then manually assigns mock attributes. This avoids LadybugDB import-time side effects and keeps tests deterministic.
test_kg_agent_semantic.py — Semantic search and retrieval pipeline tests¶
Tests semantic_search, _vector_primary_retrieve, and the query() retrieval pipeline. Uses the pytest agent fixture which patches real_ladybug and Anthropic at import time.
Test classes:
| Class | Focus |
|---|---|
TestSemanticSearchFastPath |
Article title matches an existing node — reuses stored embedding, no EmbeddingGenerator instantiated |
TestSemanticSearchFreeText |
Non-title query generates an on-the-fly embedding, lazy initialization, deduplication of multiple sections from the same article |
TestSemanticSearchValidation |
Rejects invalid top_k (0, >500, non-int), raises RuntimeError after close(), close() clears _embedding_generator |
TestVectorPrimaryRetrieval |
High confidence returns results, empty vector returns (None, 0.0), DB exception returns (None, 0.0), low distance returns results with max_sim < 0.6 |
TestABTestingFlags |
enable_reranker/multidoc/fewshot constructor flags, from_connection() defaults |
Behavioral contract assertions:
The two tests test_query_skips_llm_when_high_confidence and test_query_never_calls_llm_cypher assert on the observable output field result["query_type"], not on whether internal private methods were or were not called:
def test_query_skips_llm_when_high_confidence(self, agent):
# ... set up high-similarity vector results ...
result = agent.query("physics question")
assert result["query_type"] == "vector_search" # output contract
def test_query_never_calls_llm_cypher(self, agent):
# ... set up low-similarity vector results (distance=0.9, similarity=0.1) ...
result = agent.query("obscure query")
assert result["query_type"] == "confidence_gated_fallback" # output contract
This approach tests the behavior contract (what the API returns) rather than the implementation detail (which internal method ran), making the tests robust to future refactoring.
test_kg_agent_queries.py — Integration tests (DB-dependent)¶
End-to-end tests that query a live KnowledgeGraphAgent against the 30K Wikipedia article database. These tests verify the full pipeline from question to synthesized answer.
Skip guard:
pytestmark = pytest.mark.skipif(
not os.path.exists("data/wikigr_30k.db"),
reason="data/wikigr_30k.db not found — skipping DB-dependent tests",
)
All tests in this file skip automatically when data/wikigr_30k.db is absent. This prevents kuzu.Error (via real_ladybug) and FileNotFoundError in CI environments where the database is not available.
Test levels:
| Level | Class | Focus |
|---|---|---|
| 1 | TestLevel1_SimpleRetrieval |
Entity lookup, article existence, fact retrieval |
| 2 | TestLevel2_SemanticSearch |
Concept proximity, category search |
| 3 | TestLevel3_RelationshipTraversal |
Entity paths, founder queries, indirect connections |
| 4 | TestLevel4_MultiHopReasoning |
Transitive relationships, concept comparison |
| 5 | TestLevel5_TemporalAndCausal |
Cause/effect, prerequisite knowledge |
| 6 | TestLevel6_ConstraintSatisfaction |
Multi-constraint search, negation |
| 7 | TestLevel7_Reasoning |
Contradiction detection, knowledge gap identification, analogical reasoning |
| 8 | TestLevel8_Compositional |
Filter+rank chaining, aggregate+compare, nested sub-queries |
| — | TestPerformance |
Simple query < 5s, complex query < 15s, semantic search < 3s |
Four tests within this file are individually marked @pytest.mark.skip(reason=...) — unconditionally, not conditionally — because they require data not present even in the full 30K DB (e.g., vector index, temporal/historical data). These are separate from the 27-skip count above: the 27 skips come entirely from the module-level pytestmark guard when the database is absent; these 4 would additionally skip even when the database is present.
Other test files¶
| File | Focus |
|---|---|
test_cypher_rag.py |
Cypher RAG pack retrieval augmentation |
test_cross_encoder.py |
CrossEncoderReranker joint query-document scoring |
test_retrieval_enhancements.py |
GraphReranker, MultiDocSynthesizer, FewShotManager |
test_kg_agent_benchmark.py |
Benchmark/eval harness |
test_kg_agent_gold.py |
Gold standard accuracy tests |
test_seed_agent.py |
LLM seed researcher agent |
Expected Test Output¶
Without data/wikigr_30k.db (CI / local dev)¶
pytest tests/agent/ -q
...
44 passed, 27 skipped in X.XXs
- 44 passed: All unit tests and mock-based tests (24 from
test_kg_agent_core.py, 20 fromtest_kg_agent_semantic.py) - 27 skipped: All 27 tests in
test_kg_agent_queries.py— the module-levelpytestmarkguard skips the entire file whendata/wikigr_30k.dbis absent - 0 failed: No test should fail in a clean environment without the DB
With data/wikigr_30k.db present¶
Integration tests in test_kg_agent_queries.py execute against the live database. Results depend on database content and Anthropic API availability.
Design Notes¶
Removed test targets¶
The following methods were removed from production code during a dead code cleanup and are no longer tested:
| Removed method | Former test location |
|---|---|
_validate_cypher |
test_validate_cypher.py (deleted) |
_execute_query |
TestExecuteQuery in test_kg_agent_core.py (removed) |
_execute_fallback_query |
TestExecuteFallbackQuery in test_kg_agent_core.py (removed) |
_plan_query |
Previously asserted as "not called" in test_kg_agent_semantic.py (assertions replaced with output contracts) |
These test classes and files were removed when their production targets were deleted. Re-introducing these methods would require writing new tests.
Mock strategy¶
Unit tests in test_kg_agent_core.py use KnowledgeGraphAgent.__new__ to bypass __init__ and assign mock attributes directly. This is intentional:
- Avoids LadybugDB database file requirements
- Eliminates Anthropic API key requirements
- Keeps tests fast (no I/O)
- Tests individual methods in full isolation
Tests in test_kg_agent_semantic.py use the agent pytest fixture, which patches real_ladybug and Anthropic at module level during construction. This tests the constructor path and the interaction between __init__ and the mocked dependencies.
Security¶
- All test data uses synthetic/anonymized article titles and content (e.g., "Quantum Mechanics", "Deep Learning")
- No real API keys are used in any test
data/wikigr_30k.dbis covered bydata/**/*.dbin.gitignore— no accidental commit of the database file
See Also¶
- KG Agent API Reference — full
KnowledgeGraphAgentAPI - Confidence-Gated Context Injection — how
CONTEXT_CONFIDENCE_THRESHOLDworks - Vector Search as Primary Retrieval — full retrieval pipeline
- Phase 1 Enhancements — GraphReranker, MultiDocSynthesizer, FewShotManager