Adaptive Relevance Thresholds Explained
Retrieval scores determine whether an answer is safe to deliver. Too low and you hallucinate; too high and you refuse legitimate questions. Adaptive thresholds solve the equation by tuning per tenant and corpus.
What is the threshold?
It is the minimum retrieval score required to pass context into the LLM. CrawlBot uses hybrid scoring (vector + lexical fusion), normalizes results, and compares the highest score to a threshold.
Adaptive approach
- Seed: Initialize with historical P95 score (e.g., 0.82) for the tenant or a global default.
- Collect: Log scores for every answered query along with fallback reasons.
- Calculate: Maintain rolling windows (e.g., last 1,000 chats) and compute percentiles.
- Adjust: If fallback_reason=low_score spikes, lower the threshold slightly; if hallucination feedback rises, raise it.
- Audit: Record each adjustment in a policy log with timestamp, old value, new value, and reason.
Signals to monitor
- Containment vs fallback_reason=low_score.
- Negative feedback flagged as “incorrect” when scores were near the threshold.
- Corpus changes (big crawl, new language) that shift score distributions.
Implementation tips
- Use exponential moving averages to avoid overreacting to single events.
- Cap adjustments within a safe band (e.g., ±0.05) unless manual overrides apply.
- Provide an admin override per tenant for regulated industries needing stricter refusals.
CrawlBot automation
CrawlBot’s config-profiles service stores adaptive policies per tenant, emits change events, and exposes them in the admin UI. Ops can simulate new thresholds using historic logs before applying them. Bring the same rigor to your stack to keep assistants confident and safe.***