All Projects
IoT2023Production

Nexus

Industrial IoT Data Pipeline

High-throughput data pipeline connecting 50+ industrial data sources to a unified analytics layer. Event-driven architecture with sub-100ms data freshness at 500K events per second.

50+

Data Sources

500K/s

Peak Throughput

<100ms

Data Freshness

Context

A warehouse operator had accumulated 50+ disparate data sources — WMS, TMS, conveyor PLCs, RFID readers, barcode scanners, and ERP — with no unified data layer for analytics or ML workloads.

The Challenge

Sources used incompatible protocols (OPC-UA, Modbus, REST, JDBC), had different update frequencies, and produced raw binary formats. Analysts were writing one-off scripts per source; the data team was drowning in integration work.

The Solution

Designed Nexus as a connector-first data pipeline platform. Each source type has a configurable connector that normalizes to a common Avro schema before writing to Kafka. Spark Structured Streaming handles enrichment, deduplication, and routing. The output layer writes to InfluxDB (time-series) and Snowflake (analytics).

Methodology

  • Connector framework with 12 pre-built adapters for industrial protocols
  • Schema registry (Confluent) for forward-compatible data contracts
  • Exactly-once semantics via Kafka transactions and idempotent consumers
  • Infrastructure-as-code with Terraform for reproducible deployments

Impact

Reduced integration engineering from 2 weeks per source to 2 hours. Analytics team onboarding time for new data sources dropped from months to days.

Technology Stack

Apache KafkaApache SparkAWSTerraformPythonAvroSnowflake

Project Details

CategoryIoT
Year2023
StatusProduction