SENSUM - Rail Ecodriving System
Big Data analytics platform for optimizing train driving to reduce energy consumption. System collects real-time telemetry from trains, processes it through ETL pipelines, and provides ML-driven recommendations to drivers for more efficient and eco-friendly operation.
Project Information
- Category: Data
- Status: Deployed
- Type: Data Project
Technologies Used
About This Project
Big Data analytics platform for optimizing train driving to reduce energy consumption. System collects real-time telemetry from trains, processes it through ETL pipelines, and provides ML-driven recommendations to drivers for more efficient and eco-friendly operation.
Problem & Context
SENSUM is an innovative ecodriving system for rail transport, developed by PGE Energetyka Kolejowa in cooperation with PGE Systemy and REDS. The system helps train operators optimize driving patterns to reduce energy consumption by over 8.5%, while improving punctuality and passenger comfort. Currently deployed at Koleje Mazowieckie (Mazovian Railways), one of Poland's largest rail operators.
System Overview
SENSUM collects and analyzes Big Data from train telemetry systems in real-time:
- Real-time data collection from train sensors and onboard systems during operation.
- Big Data processing pipeline handling thousands of data points per journey.
- ML-driven recommendations provided to drivers via in-cab interface for optimal driving patterns.
- Energy savings tracking and analytics for continuous improvement.
Data Engineering Architecture
- Real-time Data Ingestion: Streaming telemetry data from trains via Kafka for high-throughput processing.
- ETL Pipelines: PySpark-based transformation pipelines to clean, normalize, and enrich raw sensor data.
- Data Storage: MongoDB and HDFS for storing processed telemetry and historical journey data.
- Orchestration: Apache Airflow for scheduling and monitoring ETL workflows.
- ML Data Preparation: Feature engineering and data preparation pipelines feeding machine learning models.
Impact & Results
- Over 10,000 hours of testing with 5 rail operators covering ~600,000 km of test journeys.
- Energy savings exceeding 300 MWh during testing phase (8.5% reduction).
- First production deployment at Koleje Mazowieckie, with estimated annual savings of over 22 GWh.
- Improved driving smoothness and punctuality, enhancing passenger experience.
My Role
As a Data Engineer at REDS S.A., I designed and implemented the core ETL pipeline infrastructure for SENSUM. My responsibilities included:
- Building real-time data ingestion pipelines using Kafka to collect telemetry data from trains.
- Developing PySpark-based ETL jobs to transform raw sensor data into structured, ML-ready datasets.
- Designing data models and schemas for efficient storage and querying of train telemetry in MongoDB and HDFS.
- Creating feature engineering pipelines to extract meaningful patterns from train operation data.
- Orchestrating data workflows with Apache Airflow to ensure reliable, scheduled processing.
- Optimizing data processing performance to handle high-volume, real-time streams from multiple trains simultaneously.
The ETL infrastructure I built processes millions of data points daily, enabling the ML models to provide real-time driving recommendations that help operators save energy while maintaining safety and punctuality.