diagram.mmd — flowchart
IoT Sensor Data Pipeline flowchart diagram

An IoT sensor data pipeline describes the sequential stages that transform raw physical measurements into clean, structured records ready for storage, analysis, and alerting — handling everything from initial sampling and noise filtering through protocol encoding and cloud ingestion.

Sensors produce a continuous stream of raw electrical signals — voltages from a thermistor, pulse counts from a flow meter, or binary state changes from a contact switch. The first pipeline stage is sampling: the microcontroller's ADC reads the signal at a fixed rate (e.g., 10 Hz) and buffers the values in memory. Buffering is essential because downstream processing steps are slower than raw acquisition.

The preprocessing stage cleans the buffered readings. A moving-average or median filter removes transient noise spikes. Outlier rejection discards readings that deviate by more than N standard deviations from the local mean — a symptom of EMI interference or sensor fault. Calibration then maps the cleaned value through a linearisation curve to produce a physical unit (°C, Pa, m/s).

Cleaned readings enter a feature extraction stage, where derived values are computed: min/max over a sliding window, rate of change, or a fast Fourier transform magnitude for vibration analysis. These features are far more useful to downstream consumers than raw samples, and they compress the data volume significantly.

The extracted features are serialised — typically as JSON, MessagePack, or a binary Protobuf — and published over MQTT to a local broker or directly to a cloud endpoint. A deduplication check prevents re-sending messages that were acknowledged in a previous attempt. The cloud ingestion service writes confirmed messages to a time-series database and enqueues them for stream processing. For the full device-to-cloud journey, see IoT Device Data Flow. For edge-side computation in more depth, see IoT Edge Processing. For downstream aggregation of ingested data, see IoT Data Aggregation.

Free online editor
Edit this diagram in Graphlet
Fork, modify, and export to SVG or PNG. No sign-up required.
Open in Graphlet →

Frequently asked questions

An IoT sensor data pipeline is the sequence of processing stages that transforms raw electrical signals from a sensor into clean, structured records suitable for storage and analysis. Stages typically include sampling, noise filtering, outlier rejection, calibration, feature extraction, serialisation, and cloud ingestion.
The microcontroller samples the sensor at a fixed rate and buffers readings. A preprocessing stage applies filtering and calibration. Feature extraction computes derived values such as rolling averages or rate of change. Serialised payloads are published over MQTT to a broker and ingested into a time-series database, where they are available for alerting and analytics.
Use this diagram when designing the firmware processing chain for a new device type, debugging why sensor readings appear noisy or inaccurate in the cloud, or evaluating where to add compression or aggregation to reduce bandwidth and storage costs.
Frequent mistakes include skipping the outlier rejection step, which lets EMI spikes corrupt stored data, performing all computation in the cloud rather than filtering at the source, using a blocking MQTT publish that halts sampling if the network is slow, and neglecting deduplication so retried messages appear as duplicate data points.
mermaid
flowchart TD Sensor[Physical Sensor\nThermistor / Flow Meter / Contact] --> ADC[ADC Sampling\nFixed sample rate] ADC --> Buffer[Ring buffer\nIn device memory] Buffer --> Filter[Noise filter\nMoving average / median] Filter --> Outlier{Outlier detected?} Outlier -->|Yes| Discard[Discard reading\nlog fault counter] Outlier -->|No| Calibrate[Apply calibration curve\nRaw → physical unit] Calibrate --> Feature[Feature extraction\nMin / max / rate-of-change] Feature --> Serialise[Serialise payload\nJSON / Protobuf] Serialise --> Dedup{Already acknowledged?} Dedup -->|Yes| Drop[Drop duplicate] Dedup -->|No| Publish[Publish over MQTT\nto local broker] Publish --> Broker[Local MQTT broker] Broker --> Bridge[Cloud bridge\nTLS connection] Bridge --> Ingest[Cloud ingestion service] Ingest --> TSDB[(Time-series database)] Ingest --> Stream[Stream processor\nKinesis / Kafka]
Copied to clipboard