Skip to content

Big Data project about real-time traffic congestion predicting in Warsaw using TomTom Traffic API and Open-Meteo Forecast API

License

Notifications You must be signed in to change notification settings

mytkom/TrafficCongestionPrediction

Repository files navigation

Real-time Traffic Congestion Prediction in Warsaw

A Big Data analytics system that provides real-time traffic congestion predictions using machine learning, streaming data processing, and interactive visualization. The system ingests live traffic and weather data, processes it through a Lambda architecture, and delivers one-minute-ahead congestion forecasts via a web dashboard.

Overview

This project demonstrates a complete end-to-end big data pipeline for real-time traffic analysis, implementing:

  • Real-time data ingestion from multiple streaming sources (TomTom Traffic API, Open-Meteo Weather API)
  • Lambda architecture with batch and streaming processing layers
  • Machine learning model (LSTM) for time-series traffic prediction
  • Interactive web dashboard for visualization and real-time insights

The system predicts traffic congestion levels one minute ahead by analyzing a moving window of the last 30 minutes of traffic and weather data, enabling users to make informed routing decisions.

Architecture

The system follows a Lambda architecture pattern:

  • Ingestion Layer: Apache NiFi for data collection and preprocessing
  • Streaming Layer: Apache Kafka + Apache Spark Streaming for real-time processing
  • Batch Layer: Apache Spark + Apache Hive for historical data storage and batch analytics
  • Serving Layer: Apache HBase for low-latency query serving
  • Presentation Layer: Streamlit web application with interactive map visualization

Technologies

  • Data Processing: Apache Spark (PySpark), Apache NiFi, Apache Kafka
  • Storage: Apache Hive (data warehouse), Apache HBase (real-time serving)
  • Machine Learning: TensorFlow/Keras (LSTM model), scikit-learn
  • Data Sources: TomTom Traffic API, OpenStreetMap API, Open-Meteo Weather API
  • Visualization: Streamlit, GeoPandas, Folium
  • Languages: Python 3.9, SQL

Project Structure

TrafficCongestionPrediction/
├── Data/                      # Historical data for model training
├── DataPreprocessing/         # Data ingestion and transformation modules
│   ├── TomTom/               # Traffic data preprocessing (OSM matching algorithm)
│   └── OpenMeteo/            # Weather data preprocessing (gaussian noise simulation)
├── Model/                    # Machine learning module
│   ├── train.py             # Model training script
│   ├── predict.py           # Prediction inference
│   ├── eda/                 # Exploratory data analysis
│   └── plots/               # Training and evaluation visualizations
├── Spark/                    # Batch and streaming processing
│   ├── Batch/               # Batch processing jobs
│   ├── stream_prediction_app.py  # Real-time prediction application
│   └── Common/              # Shared schemas and utilities
├── Hive/                     # Data warehouse table definitions
├── kafka/                    # Kafka topic management scripts
├── Nifi/                     # Apache NiFi flow configurations
└── PresentationLayer/        # Streamlit web application
    └── src/                  # Application source code

Key Features

  • Real-time Traffic Prediction: LSTM-based model predicting congestion 1 minute ahead
  • Interactive Map Visualization: Color-coded traffic levels with zoom and pan capabilities
  • Top 5 Rankings: Most and least congested streets in real-time
  • Geographic Matching Algorithm: Custom algorithm to match TomTom traffic segments with OpenStreetMap road geometries
  • Multi-source Data Integration: Combines traffic, weather, and road infrastructure data

Key Learnings & Achievements

  • Designed and implemented a complete Lambda architecture for real-time big data analytics
  • Developed a custom geometric matching algorithm to align dynamic traffic segments with static road networks
  • Built an LSTM time-series model achieving reliable short-term traffic predictions
  • Integrated multiple streaming data sources with different update frequencies
  • Created an end-to-end pipeline from raw API data to interactive visualizations
  • Handled real-world challenges: API rate limits, data schema mismatches, temporal alignment
  • Demonstrated production-ready practices: modular architecture, error handling, data validation

Getting Started

Prerequisites

  • Python 3.9.4
  • Apache Spark, Kafka, NiFi, Hive, HBase (for full deployment)
  • Docker (for local development with PresentationLayer)

Quick Start

Each module contains detailed setup instructions:

Project Context

This project was developed as part of a Big Data Analytics course at Warsaw University of Technology, demonstrating practical application of distributed systems, real-time analytics, and machine learning in a production-like environment.

License

See LICENSE file for details.

About

Big Data project about real-time traffic congestion predicting in Warsaw using TomTom Traffic API and Open-Meteo Forecast API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •