Predicting NYC public transit delays using machine learning.
Developed in 36 hours at the UCSB Datathon 2025: DataOrbit.
NYC Transit Pulse is a machine learning-powered system that forecasts delays in New York City’s MTA buses and trains using historical transit data. With both a web interface and REST API, it's designed for commuters, planners, and developers alike.
- Delay prediction for buses
- Time-based predictions using temporal features
- Web interface and REST API endpoints
- Automated data preprocessing pipeline
- Machine learning backend trained on real MTA data
- Python 3.8+
- Flask (Web Framework)
- scikit-learn (Machine Learning)
- pandas (Data Processing)
- Kaggle Hub (Dataset Management)
- Clone Repository
git clone https://github.com/alexyan1/nyc-transit-delay-predictor.git
cd nyc-transit-delay-predictor- Create a virtual environment (optional)
python -m venv venv
source venv/bin/activate- Install Dependencies
pip install -r requirements.txt- Run the web app
python app.pyHistorical MTA delay and schedule data was sourced from:
- Kaggle: New York City Bus Data
- Uses Huber Regression
- Time-series features (day, hour, weekday/weekend)
- Evaluation metrics: Accuracy, MAE
- The pre-trained model is in delay_pipeline.pkl
- It can be trained in the file
rlm.py
POST /predict
Example:
curl -X POST http://localhost:5000/predict \
-H "Content-Type: application/json" \
-d '{
"stop": "5th Ave",
"line": "M4",
"datetime": "2025-04-23T08:30"
}'
Response:
{
"stop": "5th Ave",
"line": "M4",
"predicted_delay": 5.2,
"units": "minutes"
}

