Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
287 changes: 253 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# πŸ›‘οΈ CyberAttackDetection-Python
# Cybersecurity Attack Detection Framework

![πŸ”§ Build Status](https://github.com/canstralian/CyberAttackDetection-Python/actions/workflows/ci.yml/badge.svg)
![πŸ“Š Coverage](https://codecov.io/gh/canstralian/CyberAttackDetection-Python/branch/main/graph/badge.svg)
Expand All @@ -8,57 +8,276 @@
![πŸš€ Release](https://img.shields.io/github/v/release/canstralian/CyberAttackDetection-Python)
![🐞 Issues](https://img.shields.io/github/issues/canstralian/CyberAttackDetection-Python)

---
## Overview

## πŸ›‘οΈ About CyberAttackDetection-Python
A modular, secure, and extensible framework for detecting cyber attacks using machine learning. This framework provides a robust foundation for cybersecurity threat detection with built-in best practices for data preprocessing, model management, REST API development, and security.

**CyberAttackDetection-Python** is a Python application designed to detect and mitigate cyber attacks using **advanced machine learning techniques**.
### Key Features

### 🌟 Features
- **πŸ›‘οΈ Security-First Design**: JWT authentication, rate limiting, input validation, and security headers
- **πŸ”§ Modular Architecture**: Clean separation of concerns with dedicated modules for preprocessing, modeling, and API
- **πŸ“Š Multiple ML Models**: Support for Random Forest, Logistic Regression, SVM, and Neural Networks
- **πŸš€ REST API**: Secure Flask-based API with comprehensive validation using Marshmallow
- **πŸ“ˆ Comprehensive Evaluation**: Built-in metrics, cross-validation, and hyperparameter tuning
- **πŸ§ͺ Testing Framework**: Extensive test suite with pytest and continuous integration
- **πŸ“– Documentation**: Jupyter notebooks, scripts, and comprehensive API documentation
- **πŸ”„ Model Management**: Easy model saving, loading, and comparison capabilities

- πŸš€ **Real-Time Attack Detection**
- 🧠 **Machine Learning Model Training and Evaluation**
- πŸ“ **Comprehensive Logging and Alerting System**
## Quick Start

---
### Installation

## πŸ“‹ How to Use
```bash
# Clone the repository
git clone https://github.com/canstralian/CyberAttackDetection-Python.git
cd CyberAttackDetection-Python

1. **Clone the Repository**
```bash
git clone https://github.com/canstralian/CyberAttackDetection-Python.git
```
# Install dependencies
pip install -r requirements.txt
```

2. **Install Dependencies**
```bash
pip install -r requirements.txt
```
### Basic Usage

3. **Run the Application**
```bash
python main.py
```
#### 1. Generate Sample Data
```bash
python scripts/generate_data.py --samples 1000 --features 20 --output-dir data/
```

---
#### 2. Train a Model
```bash
python scripts/train_model.py --data data/cyber_dataset_1000s_20f_2c.csv --model random_forest --hyperparameter-tuning
```

## 🀝 Contributing
#### 3. Start the API Server
```bash
cd src/api && python app.py
```

Contributions are welcome! πŸ› οΈ Please follow the guidelines outlined in the [CONTRIBUTING.md](CONTRIBUTING.md).
#### 4. Use the Framework Programmatically

---
```python
from src.core.preprocessing import DataPreprocessor
from src.models.detector import CyberAttackDetector
from src.utils.helpers import create_sample_dataset

## πŸ“œ License
# Generate sample data
data = create_sample_dataset(n_samples=1000, n_features=20)

This project is licensed under the **MIT License**. For more details, check the [LICENSE.md](LICENSE.md) file.
# Preprocess data
preprocessor = DataPreprocessor()
X, y, _ = preprocessor.full_preprocessing_pipeline(data)
X_scaled, _ = preprocessor.scale_features(X.select_dtypes(include=['float64', 'int64']))

---
# Train model
detector = CyberAttackDetector('random_forest')
detector.train(X_scaled, y)

# Make predictions
predictions = detector.predict(X_scaled)
evaluation = detector.evaluate(X_scaled, y)
print(f"Accuracy: {evaluation['accuracy']:.4f}")
```

## Directory Structure

```
CyberAttackDetection-Python/
β”œβ”€β”€ .github/ # GitHub workflows and templates
β”‚ └── workflows/
β”‚ β”œβ”€β”€ ci.yml # Continuous integration
β”‚ β”œβ”€β”€ security-scan.yml # Security scanning
β”‚ └── python-app.yml # Python application workflow
β”œβ”€β”€ src/ # Main source code
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ api/ # REST API module
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── app.py # Flask application with security features
β”‚ β”œβ”€β”€ core/ # Core detection and preprocessing
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── preprocessing.py # Data preprocessing pipeline
β”‚ β”œβ”€β”€ models/ # Machine learning models
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── detector.py # Model classes and management
β”‚ └── utils/ # Utility functions
β”‚ β”œβ”€β”€ __init__.py
β”‚ └── helpers.py # Helper functions and utilities
β”œβ”€β”€ tests/ # Test suite
β”‚ β”œβ”€β”€ __init__.py
β”‚ └── test_framework.py # Comprehensive tests
β”œβ”€β”€ notebooks/ # Jupyter notebooks
β”‚ └── model_training_demo.ipynb # Training demonstration
β”œβ”€β”€ scripts/ # Command-line scripts
β”‚ β”œβ”€β”€ train_model.py # Model training script
β”‚ β”œβ”€β”€ evaluate_model.py # Model evaluation script
β”‚ └── generate_data.py # Data generation script
β”œβ”€β”€ config/ # Configuration files
β”‚ β”œβ”€β”€ .env.example # Environment variables template
β”‚ └── config.py # Application configuration
β”œβ”€β”€ data/ # Data directory
β”œβ”€β”€ models/ # Saved models directory
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ pyproject.toml # Project configuration
└── README.md # This file
```

## API Usage

### Authentication

First, obtain a JWT token:

```bash
curl -X POST http://localhost:5000/api/auth/token \
-H "Content-Type: application/json" \
-d '{"username": "user"}'
```

### Available Endpoints

- `GET /api/health` - Health check
- `POST /api/auth/token` - Get authentication token
- `GET /api/models/available` - List available models
- `POST /api/detect` - Detect attacks (requires auth)
- `POST /api/train` - Train new model (requires auth)
- `POST /api/models/{model_type}/evaluate` - Evaluate model (requires auth)

### Detection Example

```bash
curl -X POST http://localhost:5000/api/detect \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-d '{
"data": [[1.2, 0.5, -0.3, ...], [0.8, -1.1, 0.4, ...]],
"model_type": "random_forest"
}'
```

## Model Types

The framework supports multiple machine learning algorithms:

- **Random Forest** (`random_forest`): Ensemble method with excellent performance
- **Logistic Regression** (`logistic_regression`): Fast linear classifier
- **Support Vector Machine** (`svm`): Powerful non-linear classifier
- **Neural Network** (`neural_network`): Multi-layer perceptron

## Security Features

- **JWT Authentication**: Secure token-based authentication
- **Rate Limiting**: Protection against API abuse
- **Input Validation**: Comprehensive request validation using Marshmallow
- **Security Headers**: Standard security headers (HSTS, XSS Protection, etc.)
- **CORS Protection**: Configurable cross-origin resource sharing
- **Error Handling**: Secure error responses without information leakage

## Development

### Running Tests

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=src

# Run specific test file
pytest tests/test_framework.py
```

### Code Quality

```bash
# Linting
flake8 src/ tests/ scripts/

# Code formatting
black src/ tests/ scripts/

# Type checking (if using mypy)
mypy src/
```

### Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes following PEP 8 guidelines
4. Add tests for new functionality
5. Run the test suite (`pytest`)
6. Commit your changes (`git commit -m 'Add amazing feature'`)
7. Push to the branch (`git push origin feature/amazing-feature`)
8. Open a Pull Request

## Configuration

### Environment Variables

Copy `config/.env.example` to `.env` and customize:

```bash
# Security
SECRET_KEY=your-secret-key-here
JWT_EXPIRATION_HOURS=1

# API Configuration
RATE_LIMIT_PER_HOUR=100
RATE_LIMIT_AUTH_PER_MINUTE=20

# Model Configuration
DEFAULT_MODEL_TYPE=random_forest
MODEL_DIRECTORY=models/

# Data Configuration
DATA_DIRECTORY=data/
MAX_FILE_SIZE_MB=16
```

## Jupyter Notebooks

Explore the framework with interactive notebooks:

- `notebooks/model_training_demo.ipynb`: Complete training and evaluation workflow
- Examples of data preprocessing, model comparison, and hyperparameter tuning

## Performance and Scalability

- **Efficient preprocessing**: Optimized data pipeline with memory management
- **Model caching**: Intelligent model loading and caching in the API
- **Batch processing**: Support for batch predictions
- **Async operations**: Performance timers and logging for optimization

## Troubleshooting

### Common Issues

1. **Import Errors**: Ensure you're in the project root directory
2. **Model Not Found**: Check if models are saved in the correct directory
3. **Authentication Errors**: Verify JWT token is valid and not expired
4. **Memory Issues**: Reduce batch size or dataset size for large datasets

### Logging

The framework includes comprehensive logging. Set log levels:

```python
from src.utils.helpers import setup_logging
logger = setup_logging('DEBUG', 'logs/debug.log')
```

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

### πŸ” Additional Information
## Acknowledgments

- **Last Updated:** ![πŸ•’ Last Commit](https://img.shields.io/github/last-commit/canstralian/CyberAttackDetection-Python)
- **Latest Release:** ![πŸš€ Release](https://img.shields.io/github/v/release/canstralian/CyberAttackDetection-Python)
- **Open Issues:** ![🐞 Issues](https://img.shields.io/github/issues/canstralian/CyberAttackDetection-Python)
- [scikit-learn](https://scikit-learn.org/) for machine learning algorithms
- [Flask](https://flask.palletsprojects.com/) for the REST API framework
- [Marshmallow](https://marshmallow.readthedocs.io/) for input validation
- [PyJWT](https://pyjwt.readthedocs.io/) for JWT authentication
- [pandas](https://pandas.pydata.org/) and [numpy](https://numpy.org/) for data processing

---

Thank you for using **CyberAttackDetection-Python**! If you encounter any issues, feel free to report them under the [Issues tab](https://github.com/canstralian/CyberAttackDetection-Python/issues). πŸ›‘οΈπŸ’»
Built with ❀️ for cybersecurity professionals and researchers.
28 changes: 28 additions & 0 deletions config/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Cybersecurity Detection Framework Configuration

# API Configuration
SECRET_KEY=your-secret-key-here-change-in-production
FLASK_ENV=development
FLASK_DEBUG=False

# Database Configuration (if needed in future)
DATABASE_URL=sqlite:///cyberattack_detection.db

# Model Configuration
DEFAULT_MODEL_TYPE=random_forest
MODEL_CACHE_SIZE=5
MODEL_DIRECTORY=models/

# Security Configuration
JWT_EXPIRATION_HOURS=1
RATE_LIMIT_PER_HOUR=100
RATE_LIMIT_AUTH_PER_MINUTE=20

# Data Configuration
MAX_FILE_SIZE_MB=16
ALLOWED_EXTENSIONS=csv,json
DATA_DIRECTORY=data/

# Logging Configuration
LOG_LEVEL=INFO
LOG_FILE=logs/cyberattack_detection.log
Loading