🛡️ DDoS Attack Prediction System

Predicting Distributed Denial-of-Service attacks before they peak using Machine Learning

🎯 The Problem

Traditional network security tools react after an attack starts — by the time an alert fires, servers are already overwhelmed and services are down. The average DDoS attack costs $2.5 million in downtime and damages.

This project flips the approach: using machine learning on network flow features to predict DDoS attacks before they fully materialize, enabling pre-emptive defense.

🧠 How It Works

Raw Network Traffic
        ↓
  Feature Extraction (22 flow metrics)
        ↓
  Correlation Analysis → Drop redundant features
        ↓
  ML Classification (AdaBoost)
        ↓
  ⚡ Prediction + Auto-Response
    ├── Normal → Allow traffic
    └── DDoS   → Trigger firewall rules, alert team

📊 Results

Model	Accuracy	ROC-AUC	CV Score
Gaussian Naive Bayes	~82%	~0.91	82% ± 0.5%
AdaBoost ⭐	~95%	~0.99	95% ± 0.3%
Random Forest	~96%	~0.99	96% ± 0.2%

AdaBoost chosen as primary model: near-identical accuracy to Random Forest but with 3× faster inference — critical for real-time traffic analysis.

🔍 Key Findings

Feature selection mattered more than algorithm choice — dropping correlated features (>0.90 threshold) via heatmap analysis improved all models
Top predictive features: flow_packets_per_sec, syn_flag_count, fwd_iat_mean — DDoS traffic shows extreme packet rates and near-zero inter-arrival times
AdaBoost's iterative learning handles the edge cases that Naive Bayes misses (Naive Bayes assumes feature independence, which doesn't hold for network traffic)

🚀 Quick Start

1. Install dependencies

pip install -r requirements.txt

2. Generate dataset

python generate_dataset.py

3. Train all models + generate visualizations

python train_models.py

4. Run the live demo

python demo/predict.py

📁 Project Structure

ddos_project/
├── generate_dataset.py      # Synthetic network traffic dataset
├── train_models.py          # Model training + all visualizations
├── demo/
│   └── predict.py           # Live real-time prediction demo
├── data/
│   └── network_traffic.csv  # Generated dataset (15,000 samples)
├── models/
│   ├── adaboost.pkl         # Trained AdaBoost model
│   ├── naive_bayes.pkl      # Trained Naive Bayes model
│   ├── random_forest.pkl    # Trained Random Forest model
│   ├── scaler.pkl           # StandardScaler
│   └── feature_cols.json    # Selected feature names
├── visualizations/
│   ├── 01_correlation_heatmap.png
│   ├── 02_accuracy_comparison.png
│   ├── 03_roc_curves.png
│   ├── 04_confusion_matrices.png
│   ├── 05_feature_importance.png
│   └── 06_feature_distributions.png
└── requirements.txt

📈 Visualizations

Feature Correlation Heatmap

Used to identify and remove redundant features before training.

ROC Curves

AdaBoost and Random Forest both achieve AUC ≈ 0.99 — near-perfect discrimination between normal and attack traffic.

Feature Importance

The model relies most on packet rate metrics and SYN flag counts — exactly the signatures that DDoS attacks exhibit.

🔧 Technical Details

Dataset: Synthetic network flow data (15,000 samples) structured after the CIC-DDoS2019 benchmark dataset. Features include inter-arrival times, packet lengths, byte counts, TCP flag counts, and flow duration metrics.

Models:

Gaussian Naive Bayes — probabilistic baseline using Bayes' theorem with Gaussian feature distributions
AdaBoost — ensemble of 100 weak decision stumps, each iteration focusing on misclassified samples
Random Forest — 100 decision trees with max_depth=10 for comparison

Preprocessing: StandardScaler normalization + correlation-based feature elimination (threshold: 0.90)

☁️ Cloud Deployment Path

This model can be deployed to protect cloud infrastructure:

AWS / GCP / Azure
       ↓
  VPC Flow Logs → Lambda / Cloud Function
       ↓
  Model Inference (<1ms per flow)
       ↓
  Auto-trigger: WAF rules, IP blocking, scale-up

👤 Author

Amogh Nellutla — M.S. Cybersecurity, Montclair State University
[email protected]

📄 License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Presentation Images		Presentation Images
__pycache__		__pycache__
aws		aws
data		data
demo		demo
models		models
visualizations		visualizations
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
generate_architecture.py		generate_architecture.py
generate_dataset.py		generate_dataset.py
requirements.txt		requirements.txt
send_metrics.py		send_metrics.py
train_models.py		train_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ DDoS Attack Prediction System

🎯 The Problem

🧠 How It Works

📊 Results

🔍 Key Findings

🚀 Quick Start

1. Install dependencies

2. Generate dataset

3. Train all models + generate visualizations

4. Run the live demo

📁 Project Structure

📈 Visualizations

Feature Correlation Heatmap

ROC Curves

Feature Importance

🔧 Technical Details

☁️ Cloud Deployment Path

👤 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛡️ DDoS Attack Prediction System

🎯 The Problem

🧠 How It Works

📊 Results

🔍 Key Findings

🚀 Quick Start

1. Install dependencies

2. Generate dataset

3. Train all models + generate visualizations

4. Run the live demo

📁 Project Structure

📈 Visualizations

Feature Correlation Heatmap

ROC Curves

Feature Importance

🔧 Technical Details

☁️ Cloud Deployment Path

👤 Author

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages