Token Sanitizer Reference¶

The token sanitizer (amplihack.utils.token_sanitizer) replaces API keys and secrets in request and response bodies before they are logged, stored in memory, or forwarded to external services.

Contents¶

What Gets Sanitized
Pattern Ordering
Usage
API
Adding Custom Patterns
Redacted Tokens

What Gets Sanitized¶

The sanitizer detects and replaces the following token types:

Token Type	Pattern	Redacted Form
OpenAI project key	`sk-proj-…`	`sk-proj-***`
OpenAI standard key	`sk-…`	`sk-***`
Azure subscription ID	UUID format in Azure paths	`[REDACTED:azure-subscription-id]`
GitHub PAT	`ghp_…` or `github_pat_…`	`ghp_*` / `github_pat_*`
Generic bearer token	`Bearer <token>` in headers	`[REDACTED:bearer-token]`

Pattern Ordering¶

Patterns are matched in order from most specific to least specific. This is critical for correct redaction.

The sk-proj- pattern is placed before the sk- pattern. Without this ordering, every project key would be mis-redacted as a generic OpenAI key:

Input:    sk-proj-abc123…
Correct:  sk-proj-***   (sk-proj- matches first)
Incorrect: sk-***        (if sk- matched first)

The redacted form matters: downstream systems use the prefix to decide which key management vault was the source and to alert on unexpected key types appearing in logs.

Usage¶

from amplihack.utils.token_sanitizer import sanitize

raw_body = '{"api_key": "sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"}'  # pragma: allowlist secret
clean_body = sanitize(raw_body)
print(clean_body)
# {"api_key": "sk-proj-***"}  # pragma: allowlist secret

The function is idempotent — the replacement strings (e.g. sk-***) contain only three asterisks after the prefix, which is shorter than the {6,} minimum required by any pattern. A second call produces the same output.

# Idempotent: safe to call twice
twice = sanitize(sanitize(raw_body))
assert twice == sanitize(raw_body)

API¶

`sanitize(text: str) -> str`¶

Scans text for known secret patterns and replaces each match with its redacted form. Returns the sanitized string.

Parameters

Name	Type	Description
`text`	`str`	Input text, typically a JSON request or response body.

Returns str — A copy of text with all recognised secrets replaced using the *** format (e.g. sk-***, sk-proj-***).

Raises Nothing. Errors in regex compilation surface at import time, not at call time.

`sanitize_dict(data: dict | None) -> dict`¶

Recursively sanitizes a dictionary. Returns a new dict with secrets replaced.

Keys in FULLY_REDACT_KEYS (includes Authorization, X-Api-Key) have their entire value replaced with [REDACTED:bearer-token].
All other string values are passed through sanitize().
Nested dicts and lists are processed recursively.
None input returns an empty dict.

from amplihack.utils.token_sanitizer import sanitize_dict

raw = {"Authorization": "Bearer sk-ant-abc123", "Content-Type": "application/json"}
clean = sanitize_dict(raw)
# {"Authorization": "[REDACTED:bearer-token]", "Content-Type": "application/json"}

Adding Custom Patterns¶

Patterns are defined as a list of (raw_string_regex, replacement) tuples in token_sanitizer.py. Insert more-specific patterns before less-specific ones:

# In token_sanitizer.py — maintain specificity order
PATTERNS = [
    (r"sk-proj-[a-zA-Z0-9_-]{6,}", "sk-proj-***"),  # More specific
    (r"sk-[a-zA-Z0-9_-]{6,}",       "sk-***"),       # Less specific — MUST follow
    # Add project-specific patterns here, most specific first
]

Redacted Tokens¶

Redacted values appear in:

The amplihack proxy access log (~/.amplihack/.claude/runtime/logs/proxy.log)
Memory records created from sanitized request content
Trace logs when AMPLIHACK_LOG_LEVEL=DEBUG

Each redacted form preserves the token prefix for routing context (e.g. sk-proj-*** vs sk-***) while ensuring the secret value never appears in any log or stored record.