Search Query Processing: Mermaid Flowchart

Search Query Processing flowchart diagram

About Source

Search query processing is the real-time path that takes a user's raw input string and produces a ranked list of relevant results, typically within 100 milliseconds.

How query processing works

Query parsing is the first transformation. The raw string is tokenized using the same pipeline applied at index time, ensuring that the terms the user types match the tokens stored in the inverted index. The parser also identifies any query syntax: quoted phrases, boolean operators (AND, OR, NOT), field-scoped terms like title:kubernetes, and range filters.

Query rewriting improves recall and precision before execution begins. Spell correction replaces obvious misspellings with dictionary-matched alternatives. Synonym expansion adds equivalent terms — a search for "sofa" might be rewritten to match "couch" and "settee" as well. Query relaxation drops low-value terms from over-constrained queries that would otherwise return zero results.

Cache lookup checks whether an identical or semantically equivalent query was recently executed and the results are still fresh. Search result caches dramatically reduce load on the index for popular queries. The Search Result Caching diagram covers this layer in detail.

Scatter-gather execution fans the query out to all relevant shards in parallel. Each shard searches its local segment of the inverted index, scores matching documents using a ranking function such as BM25, and returns its top-K local results with scores. Because each shard sees only its subset of the corpus, the global top-K must be assembled at the coordinator.

Merge and re-rank collects the per-shard top-K lists, merges them, and applies global ranking signals that require visibility across the full result set: personalization, query-specific freshness boosts, diversity penalties, and machine-learned ranking models. The Ranking Algorithm Pipeline explains this phase in depth.

Result formatting serializes the final ranked list into the response format the client expects — JSON fields, snippets with highlighted matching terms, facet counts, spelling suggestions, and pagination cursors. The formatted response is written back to the cache before being returned to the caller.

Frequently asked questions

Search query processing is the real-time pipeline that takes a user's raw input string and returns a ranked result list. It encompasses query parsing, rewriting (spell correction and synonym expansion), cache lookup, scatter-gather execution across shards, global merge and re-ranking, and result serialization.

The raw query string is first tokenized using the same pipeline applied at index time to ensure term alignment. The parser identifies boolean operators, field scopes, and phrase quotes. The parsed query is then rewritten for better recall, checked against the result cache, and if uncached, fanned out to index shards in parallel for local scoring before results are merged globally.

Query rewriting is worth adding when zero-result rate or poor-recall complaints are high. Spell correction addresses typos; synonym expansion bridges vocabulary gaps between user language and indexed content; query relaxation rescues over-constrained queries. Even simple rewriting rules deliver measurable recall gains before a full learning-based approach is needed.

Common mistakes include tokenizing queries differently from documents (term mismatch drops recall), not normalizing the query before cache key construction (reducing hit rate), returning too many candidates from each shard (increasing merge latency), and applying personalization before global diversity enforcement (producing filter-bubble results).

Query rewriting replaces the original query or parts of it with alternative terms — for example, correcting a misspelling or substituting a synonym. Query expansion adds new terms alongside the originals, broadening the match set without removing existing terms. Rewriting changes intent interpretation; expansion widens recall while keeping the original intent intact.

mermaid

flowchart TD
    Input[User query string] --> Parse[Parse query\ntokenize and identify syntax]
    Parse --> Rewrite[Query rewriting\nspell correction, synonyms]
    Rewrite --> Cache{Cache hit?}
    Cache -->|Yes| CachedResult[Return cached results]
    Cache -->|No| Scatter[Scatter to shards in parallel]
    Scatter --> Shard1[Shard 1\nlocal BM25 score]
    Scatter --> Shard2[Shard 2\nlocal BM25 score]
    Scatter --> Shard3[Shard N\nlocal BM25 score]
    Shard1 --> Merge[Merge top-K results]
    Shard2 --> Merge
    Shard3 --> Merge
    Merge --> Rerank[Re-rank with global signals\nfreshness, personalization]
    Rerank --> Format[Format response\nsnippets, facets, pagination]
    Format --> WriteCache[Write to cache]
    WriteCache --> Response[Return results to client]