Database Index Lookup: Mermaid Flowchart Diagram

About Source

A database index lookup is the process by which the query engine uses an auxiliary data structure — typically a B-tree — to find the physical location of rows matching a WHERE clause predicate without scanning the entire table.

This diagram shows the decision point between an index scan and a sequential scan. When a query arrives at the planner, it checks whether a suitable index exists for the filter columns. If no index is available, the engine performs a sequential scan — it reads every page in the table heap from disk, evaluating the predicate row-by-row. For small tables this is fine; for millions of rows it is catastrophically slow.

When an index is present, the engine traverses the B-tree from the root node down through branch nodes until it reaches a leaf. Each leaf entry stores the indexed column value alongside a tuple identifier (TID) — a (page, offset) pair pointing to the actual row in the heap. The engine then performs a heap fetch using the TID, reading only the specific heap pages that contain matching rows.

For queries that request only columns present in the index itself — the indexed column plus any columns added as INCLUDE columns — the engine can satisfy the query from the index alone without heap fetches. This is called an index-only scan and is significantly faster for large tables.

Understanding index lookups directly informs how you write queries and design schemas. Composite indexes are only used when the leftmost prefix of the index matches the query predicates. An index on (last_name, first_name) accelerates WHERE last_name = 'Smith' but not WHERE first_name = 'Alice'. The Query Execution Plan diagram shows how the planner chooses between these strategies using cost estimates.

Frequently asked questions

A database index lookup is the process of using an auxiliary data structure — typically a B-tree — to locate specific rows in a table without scanning every page. The index stores column values alongside tuple identifiers (TIDs) that point to the row's physical location in the heap, enabling the engine to jump directly to matching rows.

The query planner checks whether a suitable index exists for the WHERE clause columns. If one exists, the engine traverses the B-tree from the root through branch nodes to a leaf node containing the target value. The leaf holds a TID (page number and offset) pointing to the actual row in the heap. The engine fetches only those heap pages, rather than reading the entire table.

The planner chooses a sequential scan when no suitable index exists, when the query's selectivity is low (matching a large fraction of the table), or when the table is small enough that a full scan is cheaper than traversing the index and fetching scattered heap pages. Stale table statistics can also cause the planner to choose the wrong strategy — running ANALYZE refreshes them.

mermaid

flowchart TD
    Query[SELECT query arrives] --> Planner[Query Planner]
    Planner --> IndexExists{Index exists\nfor filter column?}

    IndexExists -->|No| SeqScan[Sequential Scan]
    SeqScan --> ReadAllPages[Read all heap pages]
    ReadAllPages --> EvalPredicate[Evaluate predicate row-by-row]
    EvalPredicate --> SlowResult[Return matching rows\nhigh I/O cost]

    IndexExists -->|Yes| BTreeRoot[B-Tree Root Node]
    BTreeRoot --> BTreeBranch[Traverse branch nodes]
    BTreeBranch --> BTreeLeaf[Reach leaf node]
    BTreeLeaf --> TID[Extract row TID\npage + offset pointer]

    TID --> IndexOnlyCheck{All needed columns\nin index?}
    IndexOnlyCheck -->|Yes| IndexOnly[Index-Only Scan\nno heap fetch needed]
    IndexOnlyCheck -->|No| HeapFetch[Heap fetch\nread specific page]
    HeapFetch --> FastResult[Return matching rows\nlow I/O cost]
    IndexOnly --> FastResult