Chat API Reference¶
The Chat API exposes the KnowledgeGraphAgent to HTTP clients.
Two endpoints are available: a blocking JSON endpoint for simple integrations and
a streaming SSE endpoint for browser UIs that want to show progressive results.
Both endpoints share the same rate limit (5 requests/minute per IP).
POST /api/v1/chat¶
Ask a natural language question and receive a single JSON response.
Request¶
POST /api/v1/chat
Content-Type: application/json
Body fields:
| Field | Type | Required | Constraints | Description |
|---|---|---|---|---|
question |
string | Yes | 1–500 chars | Natural language question |
pack |
string | null | No | Pattern ^[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}$ |
Knowledge pack to query. Uses default graph when omitted |
max_results |
integer | No | 1–50, default 10 | Maximum number of vector-search results |
Response¶
HTTP 200 OK
Content-Type: application/json
{
"answer": "Go uses an M:N scheduling model...",
"sources": ["runtime_scheduling", "goroutines"],
"query_type": "vector_search",
"execution_time_ms": 1240.3
}
Response fields:
| Field | Type | Description |
|---|---|---|
answer |
string | Synthesized natural-language answer |
sources |
string[] | Article titles used as evidence by the agent |
query_type |
string | How the query was resolved. See query_type values |
execution_time_ms |
float | Total wall-clock time for this request, in milliseconds |
query_type values¶
| Value | Meaning |
|---|---|
vector_search |
Normal path — pack content retrieved and used for synthesis |
confidence_gated_fallback |
Confidence gate fired — Claude answered without pack context (similarity below threshold) |
vector_fallback |
Vector search returned no results at all |
Error responses¶
| HTTP status | Error code | When |
|---|---|---|
| 400 | INVALID_PACK_NAME |
pack field contains illegal characters |
| 404 | PACK_NOT_FOUND |
Named pack does not exist on this server |
| 429 | — | Rate limit exceeded (slowapi default body) |
| 500 | AGENT_ERROR |
The agent raised an unhandled exception |
| 503 | AGENT_UNAVAILABLE |
ANTHROPIC_API_KEY is not set |
Error body format (all 4xx/5xx):
{
"error": {
"code": "PACK_NOT_FOUND",
"message": "Requested pack was not found"
}
}
Examples¶
Default graph:
curl -X POST http://localhost:8000/api/v1/chat \
-H "Content-Type: application/json" \
-d '{"question": "What is quantum entanglement?"}'
Specific pack:
curl -X POST http://localhost:8000/api/v1/chat \
-H "Content-Type: application/json" \
-d '{"question": "How do channels work?", "pack": "go-expert", "max_results": 5}'
Python:
import httpx
response = httpx.post(
"http://localhost:8000/api/v1/chat",
json={
"question": "How does garbage collection work in Go?",
"pack": "go-expert",
"max_results": 10,
},
)
response.raise_for_status()
data = response.json()
print(data["answer"])
print("Sources:", data["sources"])
print(f"Answered in {data['execution_time_ms']:.0f}ms via {data['query_type']}")
GET /api/v1/chat/stream¶
Stream a chat response using Server-Sent Events (SSE).
The endpoint opens a persistent HTTP connection and delivers events in this fixed order:
sources— list of article titles used as evidencetoken— the complete answer textdone— timing and query metadataerror— emitted instead oftoken/doneif the agent raises an exception
Request¶
GET /api/v1/chat/stream?question=<text>&max_results=<n>
Accept: text/event-stream
Query parameters:
| Parameter | Type | Required | Constraints | Description |
|---|---|---|---|---|
question |
string | Yes | 1–500 chars | Natural language question |
max_results |
integer | No | 1–50, default 10 | Maximum vector-search results |
SSE Events¶
sources¶
Emitted first. Contains the list of article titles the agent used.
event: sources
data: ["runtime_scheduling","goroutines","channel_internals"]
data is a JSON-encoded string[].
token¶
The complete synthesized answer as a single plain-text string.
event: token
data: Go uses an M:N scheduling model where many goroutines are multiplexed onto a smaller number of OS threads...
data is a plain string (not JSON-encoded).
done¶
Signals that the stream is complete and carries timing metadata.
event: done
data: {"query_type": "vector_search", "execution_time_ms": 1240.3}
data is a JSON object with the same query_type and execution_time_ms fields
as the blocking POST /chat response.
error¶
Emitted if the agent raises an unhandled exception. The done event is not sent.
event: error
data: AttributeError
data is the Python exception class name (e.g. ValueError, RuntimeError).
Complete event sequence¶
event: sources
data: ["article_a","article_b"]
event: token
data: The answer text here...
event: done
data: {"query_type": "vector_search", "execution_time_ms": 1240.3}
Error responses¶
HTTP-level errors (before the stream opens):
| HTTP status | When |
|---|---|
| 400 | Query parameter validation failed (e.g. max_results out of 1–50 range, or empty question) |
| 429 | Rate limit exceeded |
| 503 | ANTHROPIC_API_KEY is not set |
Errors that occur after the stream opens (agent or DB errors) are delivered
as error events rather than HTTP status codes.
Examples¶
curl:
curl -N "http://localhost:8000/api/v1/chat/stream?question=What+is+goroutine+scheduling%3F"
JavaScript (browser EventSource):
const url = new URL('http://localhost:8000/api/v1/chat/stream');
url.searchParams.set('question', 'What is goroutine scheduling?');
url.searchParams.set('max_results', '5');
const es = new EventSource(url);
es.addEventListener('sources', e => {
const sources = JSON.parse(e.data);
renderSources(sources);
});
es.addEventListener('token', e => {
renderAnswer(e.data);
});
es.addEventListener('done', e => {
const { query_type, execution_time_ms } = JSON.parse(e.data);
renderMeta(query_type, execution_time_ms);
es.close();
});
es.addEventListener('error', e => {
showError(e.data ?? 'Stream error');
es.close();
});
Python (requests + sseclient):
import json
import requests
import sseclient
url = 'http://localhost:8000/api/v1/chat/stream'
params = {'question': 'What is goroutine scheduling?', 'max_results': 5}
response = requests.get(url, params=params, stream=True)
response.raise_for_status()
for event in sseclient.SSEClient(response).events():
if event.event == 'sources':
print('Sources:', json.loads(event.data))
elif event.event == 'token':
print('Answer:', event.data)
elif event.event == 'done':
meta = json.loads(event.data)
print(f"Done ({meta['query_type']}, {meta['execution_time_ms']:.0f}ms)")
break
elif event.event == 'error':
print('Agent error:', event.data)
break
Choosing between POST and GET/stream¶
| Consideration | POST /chat | GET /chat/stream |
|---|---|---|
| Response format | Single JSON object | Server-Sent Events |
| Latency to first byte | Full round-trip | Faster — sources arrive before answer |
| Browser compatibility | fetch + await |
Native EventSource API |
| Pack selection | pack field in body |
Not supported (uses default graph only) |
| Suitable for | CLI tools, server-to-server calls | Browser chat UIs |
Configuration¶
The chat endpoints read the following environment variables at startup:
| Variable | Required | Description |
|---|---|---|
ANTHROPIC_API_KEY |
Yes | Anthropic API key — both endpoints return 503 if absent |
WIKIGR_DATABASE_PATH |
No | Override the default LadybugDB database path |
WIKIGR_CHAT_RATE_LIMIT |
No | Override the per-IP rate limit (default 5/minute) |
Pack databases are resolved at data/packs/<pack_name>/pack.db relative to the
server's working directory. Set WIKIGR_DATABASE_PATH to an absolute path to
prevent pack lookups from depending on the server's current working directory.
Security notes¶
-
Pack name validation: The
packfield is validated against^[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}$before any filesystem access. Requests with names that do not match this pattern return400 INVALID_PACK_NAME. -
Empty question (streaming): The GET
/chat/streamendpoint enforcesmin_length=1on thequestionparameter (matching the POST endpoint). Empty strings are rejected with HTTP 400 before reaching the LLM. -
Rate limiting: Both endpoints are rate-limited to 5 requests/minute per IP via slowapi. When deployed behind a reverse proxy, configure
forwarded_allow_ipsto match the proxy's CIDR so that the real client IP is used rather than the proxy IP. -
Authentication: There is no authentication on these endpoints. All callers with network access can invoke the Anthropic API. Deploy behind an API gateway or add bearer-token middleware for untrusted networks.
-
SSE timeout: The streaming endpoint enforces a per-connection timeout (default 60 s, overridable via
WIKIGR_STREAM_TIMEOUT_S). When the agent does not respond within the timeout anerrorevent is emitted and the stream is closed, releasing the database connection. For additional protection, front the service with a timeout-aware proxy in production.