LLM Fallback Reason Triage

llm • observability • fallback • rag

LLM Fallback Reason Triage

Grounded assistants rely on retrieval quality and reliable LLM providers. When something goes wrong, the model should refuse and log why. Fallback reasons turn a vague “I don’t know” into actionable telemetry. Here is how to triage them systematically.

Core categories

Fallback reasonMeaningNext action
low_scoreRetrieval confidence below thresholdExpand crawl scope, tune chunking, adjust thresholds cautiously.
no_contextNothing retrieved due to empty corpus filtersVerify tenant allowlists, language filters, or embed tokens.
provider_errorUpstream LLM rejected the call (5xx, throttling)Check LLM gateway retries, provider status, failover to alternate model.
timeoutProvider or network exceeded SLATune timeout budget, add jitter, or reduce payload size.
context_overflowPrompt + excerpts exceed context windowTrim preamble, reduce number of chunks, or upgrade model window.

CrawlBot stores these reasons per message with timestamps, tenant IDs, embed IDs, and retrieval metadata.

Triage workflow

  1. Monitor dashboards: Plot fallback_reason counts per tenant and embed. Set alerts when any category exceeds 5 percent of chats.
  2. Drill into transcripts: For low_score/no_context, review retrieved chunks, scores, and citations to spot crawl gaps.
  3. Check LLM gateway logs: Correlate provider_error/timeout spikes with upstream provider incidents; ensure failover to OpenAI or Gemini fallback stays healthy.
  4. Validate prompts: Context_overflow often signals prompts with too much boilerplate; ensure instructions stay tight and reference citations succinctly.
  5. Feed results to ops: Tag fallback events with severity and route them to the AI ops Chat space. Include direct links to CrawlBot’s analytics view.

Prevention tactics

  • Keep crawl runs fresh (IndexNow + scheduled crawls) to avoid low_score from stale content.
  • Use adaptive thresholds seeded from P95 historical scores per AGENTS spec.
  • Trim citations to the minimum required per policy; long legal disclaimers eat context budget.
  • Store fallback reason along with user feedback. If thumbs-down correlate with provider_error, escalate to LLM vendor.

Reporting cadence

  • Daily: Quick scan of fallback charts for enterprise tenants and demo environments.
  • Weekly: Generate a summary with counts per reason, top affected tenants, and remediation status.
  • Quarterly: Review threshold settings, prompt boilerplate, and crawler coverage using aggregated fallback trends.

Fallback reason triage keeps AI quality measurable. Instead of shipping band-aid prompts, use these signals to fix the right layer: crawl scope, retrieval thresholds, or provider reliability.***