diagram.mmd — flowchart
Vector Database Query flowchart diagram

A vector database query is the process of finding the most semantically similar stored vectors to a query vector using approximate nearest-neighbor (ANN) search, enabling fast retrieval across millions of embeddings.

What the diagram shows

This flowchart details the internal execution path of a vector database query from input to ranked results:

1. Query vector: a dense float vector (typically 768–3072 dimensions) arrives, generated by an embedding model from the raw user query (see Embedding Generation Flow). 2. Metadata filter pre-check: if the query includes metadata filters (e.g., source = "docs" or date > 2024-01-01), the database narrows the candidate set before the ANN search. 3. ANN index lookup: the query vector is compared against the indexed vectors using an ANN algorithm such as HNSW (Hierarchical Navigable Small World) or IVF (Inverted File Index). This trades exact results for orders-of-magnitude faster search. 4. Candidate retrieval: the ANN pass returns an over-sampled set of candidates — more than the final k requested — to allow re-ranking. 5. Re-ranking: candidates are optionally re-ranked using a more precise similarity metric (e.g., exact cosine similarity or a cross-encoder model) to improve result quality. 6. Fetch payloads: the database retrieves the stored payload — original text chunks and metadata — for each final result. 7. Return top-k results: the results, ordered by similarity score, are returned to the calling application.

Why this matters

ANN indexes make it possible to search millions of vectors in milliseconds. Understanding the query execution path helps engineers tune index parameters, filter strategies, and re-ranking tradeoffs that directly impact retrieval quality and latency. For the full RAG context see RAG Architecture.

Free online editor
Edit this diagram in Graphlet
Fork, modify, and export to SVG or PNG. No sign-up required.
Open in Graphlet →

Frequently asked questions

A vector database query is the process of finding stored vectors that are most similar to a query vector using approximate nearest-neighbor (ANN) search. Unlike SQL queries that match on exact values, vector queries match on geometric proximity in a high-dimensional embedding space, enabling semantic retrieval.
The query vector is compared against indexed vectors using an ANN algorithm — most commonly HNSW or IVF. An over-sampled candidate set is returned and optionally re-ranked using exact cosine similarity or a cross-encoder model, before the associated text payloads are fetched and returned as the final top-k results.
Add metadata filtering when your index holds documents from multiple sources, time periods, or access tiers and you need to restrict results to a relevant subset before the ANN pass. Pre-filtering reduces the search space and avoids irrelevant results polluting the candidate pool.
Frequent mistakes include setting `ef_search` (HNSW) too low (sacrificing recall for speed), not re-ranking the ANN candidate set (missing relevant results ranked below top-k), and failing to match embedding dimensions between the document index and the query encoder.
mermaid
flowchart TD A([Query vector]) --> B{Metadata filters present?} B -- Yes --> C[Apply pre-filter to narrow candidate set] B -- No --> D[Use full index] C --> E[ANN index lookup via HNSW or IVF] D --> E E --> F[Retrieve over-sampled candidate list] F --> G{Re-ranking enabled?} G -- Yes --> H[Score candidates with exact cosine similarity] H --> I[Sort by score, keep top-k] G -- No --> I I --> J[Fetch text chunks and metadata for top-k IDs] J --> K([Return ranked results with scores and payloads])
Copied to clipboard