Search Sharding Architecture: Mermaid Flowchart

Search Sharding Architecture flowchart diagram

About Source

Search sharding distributes a large inverted index across multiple physical nodes so that no single machine must store or search the entire document corpus, enabling horizontal scaling of both index capacity and query throughput.

How search sharding works

Index coordinator is the entry point for both indexing writes and search queries. It maintains the shard map — a table that records which shard ID owns which subset of the document space — and routes operations to the correct shards.

Shard allocation determines how documents are assigned to shards. The most common strategy is hash-based routing: a hash function is applied to the document ID (or a nominated routing field like tenant ID), and the result modulo the shard count yields a shard ID. Range-based partitioning is used when locality matters — for example, assigning documents by publication date to allow efficient time-range pruning.

Primary shards hold the authoritative copy of the index for their partition. Each primary shard is an independent Lucene index instance. During indexing, the coordinator sends the document to the correct primary, which writes it to a local segment and acknowledges once durable.

Replica shards are copies of primaries that serve read (search) traffic and provide failover. In Elasticsearch, each primary has N configurable replicas. The coordinator can route search requests to any replica, spreading query load across the replica set. If a primary fails, a replica is promoted — a process analogous to Database Replication failover.

Scatter-gather query execution fans a search request out to all relevant shards simultaneously (or a subset for filtered queries). Each shard executes the query locally and returns a top-K list. The coordinator merges the per-shard results, re-scores globally, and returns the final list to the Search Query Processing layer. Adding shards scales the corpus size linearly while keeping per-shard query latency roughly constant — the main lever for search infrastructure growth.

Frequently asked questions

Search index sharding is the practice of partitioning a large inverted index across multiple physical nodes so that no single machine must store or search the entire document corpus. Each shard is an independent index instance, and a coordinator routes indexing writes and search queries to the correct shards.

The coordinator fans the query out to all relevant shards in parallel. Each shard searches its local segment, scores matching documents, and returns its top-K results with scores. The coordinator merges the per-shard lists, applies global re-scoring, and returns the final ranked result. Because all shards execute simultaneously, query latency scales with the slowest shard, not the total shard count.

Add more shards (increase shard count) when the index size grows beyond what individual shards can hold or search efficiently — this scales storage and indexing throughput. Add more replicas when query throughput is the bottleneck but index size per shard is healthy — replicas serve read traffic without changing the shard layout.

Common mistakes include choosing a shard count that is too low to allow future growth without a full re-index, using a routing key that creates hot shards (uneven document distribution), and not accounting for the coordinator merge overhead when setting per-shard top-K — returning too few candidates per shard causes the global top-K to be inaccurate.

mermaid

flowchart TD
    Client[Client application] --> Coordinator[Index coordinator\nshard map and routing]
    Coordinator --> HashRoute[Hash document ID\nto shard number]
    HashRoute --> Primary1[(Primary Shard 1\ndocs 0-999k)]
    HashRoute --> Primary2[(Primary Shard 2\ndocs 1M-1.9M)]
    HashRoute --> Primary3[(Primary Shard 3\ndocs 2M+)]
    Primary1 --> Replica1A[(Replica 1A)]
    Primary1 --> Replica1B[(Replica 1B)]
    Primary2 --> Replica2A[(Replica 2A)]
    Primary3 --> Replica3A[(Replica 3A)]
    QueryIn[Search query] --> Coordinator
    Coordinator --> Scatter[Scatter query\nto all shards]
    Scatter --> Primary1
    Scatter --> Primary2
    Scatter --> Primary3
    Primary1 --> Gather[Gather top-K\nper shard results]
    Primary2 --> Gather
    Primary3 --> Gather
    Gather --> Merge[Merge and re-rank\nglobal result list]