A Production-Grade Machine Learning System designed to predict aircraft engine failures before they happen. Built with XGBoost, FastAPI, and robust Time-Series Engineering principles.
- Failure Prediction: Accurately predicts if an engine will fail within the next 20 cycles (Binary Classification).
- Leakage-Proof Validation: Implements Unit-Based Splitting to ensure the model generalizes to new, unseen engines.
- Time-Series Engineering: auto-calculates Rolling Statistics (Mean/Std) and Lag Features to capture degradation trends.
- Real-Time Inference: Exposes a high-performance FastAPI endpoint with <15ms latency.
- Metadata Tracking: Automatically versions models with training timestamps, hyperparameters, and evaluation metrics.
We optimize for Recall (capturing as many failures as possible) while maintaining high Precision.
| Metric | Score | Meaning |
|---|---|---|
| ROC-AUC | 0.9967 | Excellent discrimination between healthy and failing states. |
| Recall | 0.7480 | Accurately identifies ~75% of imminent failures (Safety Critical). |
| Precision | 0.6619 | ~66% of alerts are real failures (Cost Effectiveness). |
| F1 Score | 0.7023 | Strong balance between safety and maintenance costs. |
| Threshold | 0.28 | Tuned threshold to prioritize safety over default 0.5. |
| Latency | ~10ms | Inference time per request (99th percentile). |
The project follows a modular, clear separation of concerns:
graph LR
A[Raw Sensor Data] -->|Cleaning| B(Data Processing)
B -->|Feature Engineering| C{Feature Store}
C -->|Rolling/Lag/Trend| D[XGBoost Classifier]
D -->|Artifacts| E[Model Registry]
E --> F[FastAPI Service]
G[Client Request] --> F
F -->|JSON Response| H[Failure Probability]
- Algorithm: XGBoost Classifier (Gradient Boosting Trees).
- Why? Proven champion for tabular/sensor data, handles non-linearities, robust to outliers.
- Target:
label = 1ifRUL <= 20else0. - Engineering:
- Rolling Window (5): Smooths sensor noise (Mean) and captures volatility (Std).
- Lags (t-1, t-2): Captures the "velocity" of degradation.
- Python 3.10+
pipandvirtualenv
# Clone repository
git clone https://github.com/your-username/aircraft-predictive-maintenance.git
cd aircraft-predictive-maintenance
# Create Virtual Environment & Install Dependencies
# (Or use the helper command)
make setupAlternatively: python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt
Fetches the NASA CMAPSS dataset and generates engineering features.
make processOutput: data/processed/train.csv (Enriched with 100+ features)
Trains the XGBoost classifier, performs time-aware splitting, and saves artifacts.
make trainOutput: models/xgb_model.joblib, models/metadata.json
Generates a detailed markdown report with metrics, confusion matrix, and latency.
make evaluateOutput: eval/report.md
Launches the High-Performance FastAPI server.
make serveAccess Swagger UI: http://localhost:8000/docs
Endpoint: POST /predict
Send raw sensor data for the current cycle. The API handles feature scaling/mapping.
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"sensor_1": 642.00,
"sensor_2": 1590.00,
"sensor_3": 1405.00,
"cycle": 150
}'{
"failure_probability": 0.824,
"risk_level": "HIGH",
"top_features": {
"sensor_11_mean_5": 0.34,
"sensor_4_mean_5": 0.12,
"sensor_9_std_5": 0.08
},
"threshold_used": 0.28,
"inference_ms": 13.103,
"model_version": "2026-02-12T02:59:02..."
}aircraft-predictive-maintenance/
├── data/ # Raw and processed datasets
├── models/ # Serialized models and metadata
├── notebooks/ # Exploratory analysis
├── src/ # Source code
│ ├── api/ # FastAPI application
│ ├── features/ # Data cleaning & feature engineering
│ ├── training/ # Model training loop
│ └── evaluation/ # Metrics & reporting
├── eval/ # Generated reports
├── Makefile # Automation shortcuts
└── requirements.txt # Dependencies
This is a sophisticated demo. A full production version would add:
- Feature Store (Feast/Tecton): To serve historical rolling features in real-time instead of the API assuming steady-state.
- Model Registry (MLflow): For deeper experiment tracking.
- CI/CD: Automatic retraining pipelines on new data ingestion.
Author: Ritesh License: MIT