MapReduce Execution: Mermaid Diagram

About Source

MapReduce is a parallel data processing model in which large datasets are processed in two functional phases — Map and Reduce — automatically distributed across a cluster, with the framework handling task assignment, failure recovery, and data shuffling between phases.

Introduced by Google in 2004 and popularized by Apache Hadoop, MapReduce hides the complexity of distributed execution behind a simple programming model: the developer writes two pure functions, and the framework does the rest.

Input Splitting: The framework divides input data (typically files in HDFS or GCS) into fixed-size input splits (default 128 MB in Hadoop). Each split is processed by one Map task, and splits are placed on nodes that hold a local replica of that data — minimizing network I/O through data locality.

Map Phase: Each Map worker reads its assigned input split, parses records, and calls the user-provided map(key, value) → list(intermediate_key, intermediate_value) function. Output key-value pairs are buffered in memory, sorted by key, and spilled to local disk partitioned by reducer ID (determined by hash(intermediate_key) % R, where R is the number of reducers).

Shuffle and Sort: Reduce workers pull their assigned partitions from all Map workers over the network. This network-intensive phase is called the shuffle. The framework merges and sorts all pulled partitions by key, grouping all values for the same key together.

Reduce Phase: Each Reduce worker calls reduce(key, list(values)) → list(output_value) for each unique key group. Output is written to the distributed filesystem. With R reducers, the job produces R output files.

Fault Tolerance: The master periodically pings worker nodes. If a Map worker fails, all its completed tasks are re-executed (because intermediate output is on local disk). If a Reduce worker fails, only incomplete tasks are restarted. See Distributed Task Scheduling for how the master assigns tasks, and Data Replication Strategy for how HDFS stores the input data reliably.

Frequently asked questions

MapReduce is a parallel data-processing model in which a large dataset is processed in two functional phases. The Map phase transforms input records into intermediate key-value pairs; the Reduce phase aggregates all values for each key into a final result. The framework handles task distribution, shuffling, and failure recovery automatically.

The framework splits input data into fixed-size chunks and assigns each to a Map worker. Map workers produce sorted, partitioned intermediate output on local disk. Reduce workers then pull their assigned partitions over the network (the shuffle), merge-sort them, and invoke the user's reduce function on each key group. Output is written to the distributed filesystem.

MapReduce is well-suited to batch jobs over large, immutable datasets where high throughput matters more than latency — log analysis, index building, ETL pipelines, and large-scale aggregations. For iterative algorithms (machine learning, graph processing) or low-latency queries, frameworks like Apache Spark or Flink are more efficient.

The most common issue is data skew: a small number of reducer keys receiving disproportionately large value lists, causing those reducers to become stragglers. Other issues include excessive spills to disk when intermediate output exceeds the in-memory buffer, and poor data locality when input splits are scheduled on nodes that do not hold a local replica.

mermaid

flowchart TD
    A([Input Data — HDFS\n3 TB log files]) --> B[Master splits input\ninto 128 MB splits]
    B --> S1[Split 1]
    B --> S2[Split 2]
    B --> S3[Split 3]
    B --> S4[Split 4]
    B --> S5[Split 5]

    S1 --> M1[Map Worker 1\nparse and emit key-value pairs]
    S2 --> M2[Map Worker 2\nparse and emit key-value pairs]
    S3 --> M3[Map Worker 3\nparse and emit key-value pairs]
    S4 --> M4[Map Worker 4\nparse and emit key-value pairs]
    S5 --> M5[Map Worker 5\nparse and emit key-value pairs]

    M1 --> SH[Shuffle and Sort Phase\npartition by hash key mod R\ngroup values per key]
    M2 --> SH
    M3 --> SH
    M4 --> SH
    M5 --> SH

    SH --> R1[Reduce Worker 1\nkeys A–F]
    SH --> R2[Reduce Worker 2\nkeys G–M]
    SH --> R3[Reduce Worker 3\nkeys N–Z]

    R1 --> O1[Output Part-00001]
    R2 --> O2[Output Part-00002]
    R3 --> O3[Output Part-00003]

    O1 --> OUT([Final Output\nwritten to HDFS])
    O2 --> OUT
    O3 --> OUT

    subgraph FaultTol ["Fault Tolerance"]
        FT1[Master pings workers\nevery 10s] --> FT2{Worker\nresponsive?}
        FT2 -->|Yes| FT1
        FT2 -->|No — map worker| FT3[Re-assign all map tasks\nto another worker]
        FT2 -->|No — reduce worker| FT4[Re-assign incomplete\nreduce tasks only]
    end

    style OUT fill:#d1fae5,stroke:#10b981
    style FaultTol fill:#fef3c7,stroke:#f59e0b