diagram.mmd — flowchart
Search Result Caching flowchart diagram

Search result caching stores the serialized output of expensive query executions so that identical or near-identical queries can be served directly from memory without touching the index, reducing latency and compute cost for popular queries.

How search result caching works

Query normalization is the first step before any cache interaction. The raw query string is lowercased, whitespace-collapsed, and stop words may be stripped to increase the chance of a cache key match across minor variations. For example, "Best Running Shoes" and "best running shoes" should resolve to the same cache key.

Cache key construction combines the normalized query string with any parameters that affect the result set: page number, language, geographic region, safe-search flag, and sort order. Personalization parameters are typically excluded from the key to allow sharing a cached result across users.

Cache lookup checks a fast in-memory store — Redis or Memcached are common choices — using the cache key. If a valid, non-expired entry exists, the cached result is returned immediately. This is a read path that bypasses the entire Search Query Processing scatter-gather phase.

Cache miss handling sends the query through the full execution pipeline: query parsing, rewriting, shard fan-out, scoring, and ranking. This is the expensive path, often taking 50–200ms depending on corpus size.

Cache write stores the result under the cache key with a TTL. TTLs are tuned by query category: highly dynamic results (breaking news, stock prices) get short TTLs of a few seconds; stable informational queries can be cached for hours. Invalidation on index updates is an alternative but operationally complex approach.

Cache eviction uses an LRU (Least Recently Used) or LFU (Least Frequently Used) policy to reclaim memory when the cache is full, retaining the most valuable entries. Cache hit rate and latency percentiles are tracked through the Search Analytics Pipeline to guide capacity and TTL tuning decisions.

Free online editor
Edit this diagram in Graphlet
Fork, modify, and export to SVG or PNG. No sign-up required.
Open in Graphlet →

Frequently asked questions

Search result caching stores the serialized output of completed query executions in a fast in-memory store such as Redis or Memcached. When an identical or equivalent query arrives, the cached result is returned directly, bypassing the expensive scatter-gather execution across index shards and reducing both latency and compute cost.
The raw query string is normalized and combined with result-affecting parameters — page number, language, region, sort order — to form a cache key. A lookup checks the in-memory store; on a hit the result is returned immediately. On a miss the full query pipeline executes, and the result is written to the cache with a TTL before being returned to the caller.
TTL-based expiry is operationally simpler and appropriate for most use cases. Index-update invalidation provides lower staleness but is complex to implement correctly at scale — every document write must determine which cached query results it could affect and purge them. A hybrid approach uses short TTLs for dynamic result sets and invalidation only for high-traffic queries with known update patterns.
The most common mistake is including personalization parameters in the cache key, which makes results user-specific and eliminates the sharing that gives caching its cost benefit. Other mistakes are setting TTLs too long for result sets that change frequently (serving stale results) and not normalizing the query before key construction (reducing hit rate due to trivial variations like capitalization).
mermaid
flowchart TD Query[Incoming search query] --> NormKey[Normalize query\nlowercase, strip stop words] NormKey --> BuildKey[Build cache key\nquery plus filter params] BuildKey --> Lookup{Cache hit?} Lookup -->|Hit, not expired| ServeCache[Serve cached results\nreturn immediately] Lookup -->|Miss or expired| Execute[Execute full query\nagainst index shards] Execute --> Results[Ranked result set] Results --> TTL{Determine TTL\nbased on query type} TTL -->|Dynamic content| ShortTTL[Short TTL\n5-30 seconds] TTL -->|Stable content| LongTTL[Long TTL\n1-24 hours] ShortTTL --> WriteCache[Write result to cache\nRedis or Memcached] LongTTL --> WriteCache WriteCache --> Eviction{Cache full?} Eviction -->|Yes| LRU[Evict LRU entries\nfree memory] Eviction -->|No| ReturnResult[Return results to client] LRU --> ReturnResult ServeCache --> ReturnResult
Copied to clipboard