diagram.mmd — flowchart
Distributed Cache flowchart diagram

A distributed cache is a caching layer spread across multiple nodes that stores frequently accessed data in memory, reducing latency and database load by serving reads from RAM rather than disk — with consistent hashing to distribute keys evenly across the cluster.

What the diagram shows

The diagram shows multiple Application Servers all connecting to a Cache Cluster composed of three nodes (Cache Node 1, Cache Node 2, Cache Node 3). Consistent hashing (handled by a client-side library or a proxy like Twemproxy) routes each key to the correct node deterministically, so all application servers independently resolve the same node for the same key.

The read path uses Cache-Aside (lazy loading): the application checks the cache first. On a cache hit, the value is returned directly. On a cache miss, the application queries the Primary Database, writes the result into the appropriate cache node with a TTL, and returns the value to the caller. Writes go directly to the database and invalidate the corresponding cache key to prevent stale reads.

Why this matters

Distributed caches like Redis Cluster or Memcached dramatically reduce database load for read-heavy workloads. A cache hit at 0.5ms versus a database query at 5–50ms represents a 10–100x latency improvement for hot data. Sizing the cluster correctly requires estimating working set size (not total data, just hot data) and eviction policies. Cache invalidation — knowing when to expire or delete entries — is one of the hardest problems in distributed systems: stale data and thundering herds (many concurrent misses) are the two failure modes to design against. For the full architecture this cache sits within, see Scalable Web Architecture.

Free online editor
Edit this diagram in Graphlet
Fork, modify, and export to SVG or PNG. No sign-up required.
Open in Graphlet →

Frequently asked questions

A distributed cache is an in-memory data store spread across multiple nodes that serves frequently accessed data at sub-millisecond latency, reducing the number of expensive database queries by storing results closer to the application tier.
Client libraries use consistent hashing to map each cache key to a specific node deterministically. On a cache miss, the application fetches data from the database, writes it to the cache with a TTL, and returns the result. On subsequent requests, the cache node returns the value directly without a database round-trip.
Use a distributed cache for read-heavy workloads with a clear hot-data pattern — user sessions, product listings, leaderboards, and computed aggregates. If your read-to-write ratio is low or data changes too frequently to be cached effectively, the invalidation overhead may outweigh the gains.
Common mistakes include caching too broadly (large objects or rarely accessed data that wastes memory), setting TTLs too long (causing stale reads), failing to handle thundering herds on cache expiry (use probabilistic early expiration or lock-based refresh), and not accounting for node failures in the cluster configuration.
mermaid
flowchart LR AppServer1[App Server 1] --> HashRing[Consistent\nHash Ring] AppServer2[App Server 2] --> HashRing AppServer3[App Server 3] --> HashRing subgraph CacheCluster[Cache Cluster] HashRing --> CacheNode1[(Cache Node 1\nKeys: A-H)] HashRing --> CacheNode2[(Cache Node 2\nKeys: I-P)] HashRing --> CacheNode3[(Cache Node 3\nKeys: Q-Z)] end CacheNode1 --> CacheHit{Cache\nHit?} CacheHit -->|Yes| ReturnValue[Return cached\nvalue to app] CacheHit -->|No - Miss| QueryDB[Query Primary\nDatabase] QueryDB --> DB[(Primary Database)] DB --> WriteCache[Write result\nto cache node + TTL] WriteCache --> ReturnValue AppServer1 -->|Write / Update| InvalidateCache[Invalidate\ncache key] InvalidateCache --> CacheCluster AppServer1 -->|Write / Update| DB
Copied to clipboard