Skip to content

CrossEncoderReranker Module Documentation

Module: wikigr.agent.cross_encoder

Module Overview

CrossEncoderReranker reranks vector search candidates by jointly scoring each query-document pair through a cross-encoder model. Unlike bi-encoder embeddings (which encode query and document independently), a cross-encoder sees both texts simultaneously and produces a much more accurate relevance score.

Accuracy Impact: +10-15% retrieval precision over bi-encoder vector search alone Latency: ~50ms per rerank call (CPU, 33MB model) — negligible versus 10-15s Opus synthesis Model: cross-encoder/ms-marco-MiniLM-L-12-v2 (default, no GPU required) Dependency: sentence-transformers (already a project dependency — no new installs needed)

Background: Why Cross-Encoders Beat Bi-Encoders

WikiGR's vector retrieval uses BGE bge-base-en-v1.5, a bi-encoder that maps each text to a fixed embedding vector. Similarity is a cosine dot product — efficient for large-scale search, but the query and document never see each other during scoring.

A cross-encoder concatenates the query and document as a single input and runs a full attention pass over both. This captures precise semantic interactions (negations, comparisons, qualifications) that bi-encoder dot products miss. Cross-encoders are unsuitable as the primary retrieval stage (they require O(N) forward passes), but they are ideal as a reranking stage over a small candidate pool.

The pipeline becomes:

bi-encoder fast search  →  top-2K candidates
cross-encoder rerank    →  top-K precise results
synthesis               →  answer

Module-Level Docstring

"""Cross-encoder reranking for improved retrieval precision.

This module provides CrossEncoderReranker which uses a cross-encoder model to
jointly score query-document pairs, providing much higher relevance precision
than bi-encoder vector search alone.

API Contract:
    CrossEncoderReranker(model_name: str) -> instance
    rerank(
        query: str,
        results: list[dict],
        top_k: int = 5
    ) -> list[dict]

Design Philosophy:
    - CPU-only inference (no GPU required)
    - Graceful degradation: __init__ failure sets _model = None, rerank() returns
      results unchanged rather than raising
    - Shallow copies of result dicts with ce_score added (does not mutate caller's list)
    - Sorted by ce_score descending
"""

Class: CrossEncoderReranker

class CrossEncoderReranker:
    """Reranks retrieval results using a cross-encoder model.

    Cross-encoders jointly process query and document text, producing more
    accurate relevance scores than bi-encoders at the cost of ~50ms latency.

    Attributes:
        _model: Loaded CrossEncoder instance, or None if load failed.

    Example:
        >>> from wikigr.agent.cross_encoder import CrossEncoderReranker
        >>> reranker = CrossEncoderReranker()
        >>> results = [
        ...     {"title": "Quantum mechanics", "content": "The study of matter at atomic scale."},
        ...     {"title": "Classical mechanics", "content": "Newton's laws of motion."},
        ... ]
        >>> reranked = reranker.rerank("What governs subatomic particles?", results, top_k=2)
        >>> reranked[0]["title"]
        'Quantum mechanics'
        >>> reranked[0]["ce_score"]
        9.231  # Raw cross-encoder logit
    """

Constructor

def __init__(self, model_name: str = DEFAULT_MODEL) -> None:
    """Load the cross-encoder model.

    Args:
        model_name: HuggingFace model identifier. Defaults to
            'cross-encoder/ms-marco-MiniLM-L-12-v2' (33MB, CPU-only).

    Side effects:
        On first call: downloads ~33MB model weights to HuggingFace cache
            (~/.cache/huggingface/). Subsequent calls load from cache.
        On any exception: logs a WARNING and sets self._model = None.
            rerank() will then return results unchanged (passthrough mode).

    Example:
        >>> reranker = CrossEncoderReranker()  # loads default model
        >>> reranker = CrossEncoderReranker("cross-encoder/ms-marco-MiniLM-L-6-v2")  # faster, smaller
    """

Constant:

Name Value Notes
DEFAULT_MODEL "cross-encoder/ms-marco-MiniLM-L-12-v2" 33MB, 12-layer MiniLM trained on MS MARCO

rerank() Method

def rerank(
    self,
    query: str,
    results: list[dict[str, Any]],
    top_k: int = 5,
) -> list[dict[str, Any]]:
    """Rerank results using cross-encoder scores.

    Args:
        query: The search query string. Used as the left side of each
            (query, document) pair fed to the cross-encoder.
        results: List of result dicts. Each dict must contain a 'content'
            key (preferred) or a 'title' key used as the document text.
            Dicts without either key are scored against an empty string.
        top_k: Maximum number of results to return. Results beyond top_k
            are discarded.

    Returns:
        Normal mode (_model is not None):
            List of up to top_k dicts sorted by 'ce_score' descending.
            Each dict is a shallow copy of the corresponding input dict
            with 'ce_score' (float) added. Original dicts are not mutated.

        Passthrough mode (_model is None):
            list(results) — full input list, original order, no ce_score,
            no truncation. Callers should not rely on top_k being applied
            in this mode.

    Raises:
        Does not raise. All exceptions from model.predict() propagate
        naturally; callers should wrap in try/except if required.

    Examples:
        >>> results = [
        ...     {"title": "Entanglement", "content": "Quantum correlation between particles."},
        ...     {"title": "Superposition", "content": "A system exists in multiple states."},
        ... ]
        >>> reranked = reranker.rerank("How do qubits store information?", results, top_k=1)
        >>> len(reranked)
        1
        >>> "ce_score" in reranked[0]
        True

        # Passthrough when model unavailable (simulate by patching load failure):
        # CrossEncoderReranker("nonexistent/model") raises ValueError (not in ALLOWED_MODELS).
        # Passthrough mode is triggered by network errors or missing sentence_transformers,
        # not by supplying an arbitrary model name.
        >>> import unittest.mock
        >>> with unittest.mock.patch("sentence_transformers.CrossEncoder", side_effect=Exception("offline")):
        ...     broken = CrossEncoderReranker()
        >>> broken._model is None
        True
        >>> returned = broken.rerank("query", results, top_k=1)
        >>> len(returned)  # top_k NOT applied in passthrough mode
        2
    """

Document text selection:

The method uses the first non-empty value from: 1. result["content"] 2. result["title"] 3. "" (empty string, scored near 0)

ce_score Field

Every dict returned in normal mode gains a "ce_score" key:

Property Value
Type float
Range Unbounded; MS MARCO model outputs raw logits, typically −10 to +10
Interpretation Higher is more relevant to the query
Presence Only added in normal mode; absent in passthrough mode

Integration with KnowledgeGraphAgent

Constructor Parameters

Cross-encoder reranking is controlled by two constructor parameters on KnowledgeGraphAgent:

Parameter Type Default Effect
use_enhancements bool True Master switch for all Phase 1 enhancements. Must be True for cross-encoder to activate.
enable_cross_encoder bool False Opt-in flag for cross-encoder reranking. Default-off because the first invocation downloads a 33MB model.

Cross-encoder is opt-in by design. The first download takes a few seconds; subsequent startups load from the local HuggingFace cache.

Retrieval Pipeline with Cross-Encoder Active

When enable_cross_encoder=True, _vector_primary_retrieve() doubles its candidate pool and then reranks:

semantic_search(query, k = max_results * 2)   # e.g. 10 candidates for max_results=5
    ↓
cross_encoder.rerank(query, candidates, top_k=max_results)
    ↓
top-max_results results sorted by ce_score

Without cross-encoder, semantic search fetches exactly max_results candidates and returns them in embedding-distance order.

Attribute

After __init__, the agent exposes self.cross_encoder:

Value Meaning
CrossEncoderReranker instance Cross-encoder active
None Disabled (enable_cross_encoder=False) or use_enhancements=False

Note: KnowledgeGraphAgent.from_connection() always sets cross_encoder = None regardless of flags. Cross-encoder activation is only available through the standard __init__ constructor.

Graceful Degradation

CrossEncoderReranker is designed to never crash the agent:

Failure Mode Behaviour
Network error on first model download __init__ logs WARNING; _model = None; rerank() returns passthrough
sentence_transformers not importable Same as above
model.predict() raises at runtime Exception propagates to _vector_primary_retrieve() caller

The agent does not currently catch predict() errors; if the model load succeeds but inference fails the exception surfaces to the caller. Wrap queries in try/except when operating in untrusted environments.

Usage Examples

Minimal — use with KnowledgeGraphAgent

from wikigr.agent.kg_agent import KnowledgeGraphAgent

agent = KnowledgeGraphAgent(
    db_path="data/packs/physics-expert/physics.db",
    use_enhancements=True,
    enable_cross_encoder=True,   # opt-in
)

result = agent.query("What is the photoelectric effect?")
print(result["answer"])

The first time this runs the 33MB model downloads automatically. Subsequent starts load from ~/.cache/huggingface/.

Standalone reranking

from wikigr.agent.cross_encoder import CrossEncoderReranker

reranker = CrossEncoderReranker()

candidates = [
    {"title": "Photoelectric effect", "content": "Emission of electrons by light."},
    {"title": "Compton scattering",   "content": "Scattering of photons by electrons."},
    {"title": "Wave–particle duality","content": "Matter exhibits wave and particle properties."},
    {"title": "Black-body radiation", "content": "Thermal radiation emitted by a body in equilibrium."},
    {"title": "Planck's law",         "content": "Distribution of radiation from a black body."},
]

reranked = reranker.rerank(
    query="How did Einstein explain light quanta?",
    results=candidates,
    top_k=3,
)

for r in reranked:
    print(f"{r['ce_score']:+.2f}  {r['title']}")

Example output:

+9.14  Photoelectric effect
+2.83  Wave–particle duality
-1.07  Planck's law

Inspecting scores before filtering

# Get all scores (set top_k=len(candidates) to disable truncation)
all_scored = reranker.rerank(query, candidates, top_k=len(candidates))

print("Score distribution:")
for r in all_scored:
    bar = "#" * max(0, int((r["ce_score"] + 10) * 2))
    print(f"  {r['ce_score']:+6.2f} {bar}  {r['title']}")

Selective enhancement flags

# Enable cross-encoder but disable graph reranker (faster startup, no LadybugDB PageRank)
agent = KnowledgeGraphAgent(
    db_path="physics.db",
    use_enhancements=True,
    enable_reranker=False,
    enable_cross_encoder=True,
)

Checking whether cross-encoder is active

agent = KnowledgeGraphAgent(db_path="physics.db", use_enhancements=True)

if agent.cross_encoder is None:
    print("Cross-encoder not active")
elif agent.cross_encoder._model is None:
    print("Cross-encoder active but model failed to load (passthrough mode)")
else:
    print("Cross-encoder fully operational")

Model Selection

CrossEncoderReranker enforces an ALLOWED_MODELS allowlist in cross_encoder.py to prevent path-traversal attacks or malicious HuggingFace repo injection.

Currently allowed models:

Model Size Latency Notes
cross-encoder/ms-marco-MiniLM-L-12-v2 33MB ~50ms Default and only allowed model

Passing any other model_name raises ValueError at construction time:

# OK
reranker = CrossEncoderReranker()  # uses DEFAULT_MODEL

# Raises ValueError: model_name 'other/model' is not in allowed models
reranker = CrossEncoderReranker("other/model")

To add a new model, add its identifier to ALLOWED_MODELS in cross_encoder.py after a security review confirming the model origin and weights integrity.

Performance

Latency

Reranking 10 candidates on a typical CPU:

Operation Time
Model load (first time, from disk) 0.5–1s
Model load (warm, already in memory) 0
predict() on 10 (query, doc) pairs ~50ms
predict() on 20 (query, doc) pairs ~95ms

Total overhead versus a 10-15 second Opus synthesis: negligible.

Memory

Component Size
Model weights (ms-marco-MiniLM-L-12-v2) ~33MB on disk; ~120MB in RAM
Activations per batch (10 pairs) ~5MB (freed after predict)

Scaling

Reranking time scales linearly with len(results). With candidate_k = max_results * 2:

max_results candidate_k Latency
5 10 ~50ms
10 20 ~95ms
20 40 ~180ms

At max_results=20 the total still fits comfortably within a 500ms budget on commodity hardware.

Testing

Tests live in tests/agent/test_cross_encoder.py. All tests mock sentence_transformers.CrossEncoder so no model download is needed during CI.

pytest tests/agent/test_cross_encoder.py -v

Expected output:

tests/agent/test_cross_encoder.py::TestCrossEncoderReranker::test_reranking_reorders_by_cross_encoder_score PASSED
tests/agent/test_cross_encoder.py::TestCrossEncoderReranker::test_empty_results_returns_empty_list PASSED
tests/agent/test_cross_encoder.py::TestCrossEncoderReranker::test_top_k_filtering_limits_output PASSED
tests/agent/test_cross_encoder.py::TestCrossEncoderReranker::test_ce_score_added_to_each_result PASSED
tests/agent/test_cross_encoder.py::TestCrossEncoderReranker::test_graceful_init_failure_returns_results_unchanged PASSED

5 passed in 0.12s

Writing additional tests

from unittest.mock import MagicMock, patch

def test_content_preferred_over_title():
    """rerank() uses 'content' field when both content and title are present."""
    mock_ce = MagicMock()
    mock_ce.predict.return_value = [0.5]

    with patch("sentence_transformers.CrossEncoder", return_value=mock_ce):
        from wikigr.agent.cross_encoder import CrossEncoderReranker
        reranker = CrossEncoderReranker()

    results = [{"title": "Title text", "content": "Content text"}]
    reranker.rerank("query", results, top_k=1)

    call_args = mock_ce.predict.call_args[0][0]
    assert call_args[0] == ("query", "Content text")


def test_passthrough_preserves_all_results():
    """Passthrough mode returns all results regardless of top_k."""
    with patch("sentence_transformers.CrossEncoder", side_effect=Exception("offline")):
        from wikigr.agent.cross_encoder import CrossEncoderReranker
        reranker = CrossEncoderReranker()

    results = [{"title": f"Article {i}"} for i in range(10)]
    returned = reranker.rerank("query", results, top_k=2)
    assert len(returned) == 10  # top_k not applied

Troubleshooting

Model does not download

Symptom: WARNING in logs: CrossEncoderReranker failed to load model '...': ...; reranker silently in passthrough mode.

Cause: No internet access at init time, or HuggingFace CDN blocked.

Fix: Pre-download the model on a networked machine and copy to the cache directory:

python -c "from sentence_transformers import CrossEncoder; CrossEncoder('cross-encoder/ms-marco-MiniLM-L-12-v2')"
# Weights are now in ~/.cache/huggingface/

Or set the TRANSFORMERS_OFFLINE=1 environment variable and point HF_HOME to a shared cache volume.

Reranker is in passthrough mode unexpectedly

Symptom: Returned results lack ce_score; order unchanged.

Diagnosis:

import logging
logging.basicConfig(level=logging.DEBUG)

from wikigr.agent.cross_encoder import CrossEncoderReranker
r = CrossEncoderReranker()
print(r._model)  # None means load failed

Cross-encoder not activated despite enable_cross_encoder=True

Symptom: agent.cross_encoder is None even though enable_cross_encoder=True was passed.

Cause: use_enhancements=False overrides all enhancement flags.

Fix:

agent = KnowledgeGraphAgent(
    db_path="...",
    use_enhancements=True,        # required
    enable_cross_encoder=True,
)

Scores look unexpectedly low

The cross-encoder produces raw MS MARCO logits, not probabilities. Scores near 0 mean uncertain; highly positive scores (>5) are strong matches; negative scores are likely irrelevant. Do not compare scores across different queries — use them only for relative ranking within a single query's results.

Security Notes

  • model_name is a server-side configuration value. Do not allow users to pass arbitrary model identifiers — it could cause the server to download and execute untrusted model weights.
  • Query strings are passed to the tokenizer but never logged. Avoid logging query content to prevent sensitive data leaking into log aggregators.
  • ce_score values should be stripped from any user-facing API responses to avoid exposing model scoring thresholds to adversaries.

See Also