Skip to content

biggeezerdevelopment/waf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Go WAF (Web Application Firewall)

A high-performance Web Application Firewall written in Go, featuring Vectorscan/Hyperscan integration for blazing-fast pattern matching and advanced machine learning for adaptive threat detection.

Features

Core Security Engine

  • Vectorscan/Hyperscan Integration: Hardware-accelerated pattern matching using SIMD instructions
  • Parallel Pattern Matching: All security rules evaluated simultaneously in a single pass
  • High Performance: 10-100x faster than traditional regex-based WAFs
  • Rule-Based Protection: Configurable rules to detect and block various attack patterns
  • Rate Limiting: Protect against brute force and DoS attacks
  • IP Whitelist/Blacklist: Control access based on IP addresses
  • Multiple Backend Support: Route requests to different backend servers

Machine Learning & Adaptive Security

  • Anomaly Detection: Custom Isolation Forest implementation for zero-day attack detection
  • Enhanced Behavioral Analysis: Comprehensive 65+ metric session profiling with advanced risk assessment
  • Multi-Layer Detection: Smart combination of rules + ML + behavioral + pattern analysis
  • Learning Pipeline: Automatic model training from legitimate traffic patterns
  • ML State Persistence: Automatic saving/loading of all ML data, models, and learning across restarts
  • Adaptive Thresholds: Self-tuning based on false positive feedback
  • Advanced Feature Extraction: 24-dimensional security feature analysis
  • Pattern Evolution System: Automatic discovery and generation of new attack patterns
  • Dynamic Rule Generation: N-gram analysis and clustering for emerging threat detection
  • False Positive Learning: Self-improving rule accuracy through user feedback
  • Progressive Risk Scoring: Multi-factor risk assessment with confidence scoring
  • Behavioral Violation Detection: Real-time detection of rate, pattern, temporal, and content anomalies
  • Session Characterization: Automated classification of human/bot/scanner behavior
  • Feedback Loop System: User-driven model improvement with attack validation and false positive correction
  • Auto-Tuning Engine: Automatic threshold optimization based on performance metrics and user feedback
  • Rule Performance Management: Dynamic promotion/deprecation of rules based on accuracy and effectiveness
  • Comprehensive Performance Tracking: Multi-component metrics with trend analysis and calibration alerts
  • Persistent Intelligence: All ML knowledge preserved across WAF restarts with configurable backup management

Security & Administration

  • API Key Authentication: X-API-Key header required for admin endpoint access
  • HTTPS/TLS Support: Full TLS encryption for admin interface and traffic
  • Certificate Management: Self-signed or custom certificate support
  • Secure Admin Endpoints: All /_waf/* endpoints require authentication when enabled
  • Configuration-based Security: Enable/disable authentication and HTTPS via config
  • Rules Database: SQLite database for dynamic rule management with full CRUD API
  • Rule History & Versioning: Complete audit trail of all rule changes with rollback support
  • Database Migration: Seamless migration from JSON rules to SQLite database

Monitoring & Management

  • Real-time Logging: Detailed request logging with attack detection and ML insights
  • Configuration Hot-Reload: Update rules and settings without restart
  • Shadow Mode: Test WAF behavior without blocking traffic (essential for safe deployment)
  • ML State Persistence: Automatic saving/loading of ML data across restarts
  • Comprehensive Configuration Management: All ML parameters externalized and configurable
  • Comprehensive Statistics: WAF performance, ML model metrics, and behavioral analytics
  • ML Management Endpoints: Model retraining, false positive reporting, anomaly statistics
  • Pattern Mining Dashboard: Dynamic rule discovery, promotion, and management (not fully implemented)
  • Enhanced Behavioral Dashboard: Session profiling, violation tracking, and risk analysis (not fully implemented)
  • Feedback & Auto-Tuning Dashboard: User feedback collection, performance monitoring, and automatic optimization (not fully implmeneted)
  • Advanced Filtering: IP-based, risk-level, and severity-based filtering capabilities
  • Intelligent Fallback: Automatic fallback to standard regex for incompatible patterns
  • Object Pooling: Optimized memory usage with request context and buffer pools
  • Environment-Specific Tuning: Production/development configurations with hot-reload capability
  • Automatic Backup Management: Configurable backup retention and cleanup for ML state

Performance

Vectorscan Integration

The WAF uses Vectorscan (a portable fork of Intel's Hyperscan) for ultra-fast pattern matching:

  • Single-pass matching: All security rules are evaluated simultaneously
  • SIMD optimization: Hardware acceleration using CPU vector instructions
  • 100% rule compatibility: All default rules are Vectorscan-compatible
  • Benchmark results: 10-100x faster than sequential regex matching for typical payloads

Machine Learning Performance

  • <1ms ML overhead: Feature extraction and anomaly detection per request
  • Real-time analysis: Non-blocking ML processing pipeline
  • Memory efficient: 10,000 concurrent session tracking with <100MB overhead
  • Automatic optimization: Hourly model retraining with sliding window data
  • Scalable: Linear performance scaling with request volume

Multi-Layer Architecture Performance

Layer 1: Vectorscan Rules          →  ~0.05ms (immediate block)
Layer 2a: ML Anomaly Detection     →  ~0.3ms  (scoring)
Layer 2b: Basic Behavioral Analysis →  ~0.2ms  (risk assessment)
Layer 2c: Enhanced Behavioral      →  ~0.1ms  (advanced profiling)
Layer 3: Progressive Decision      →  ~0.05ms (smart blocking)
Layer 4: Pattern Evolution        →  ~0.1ms  (background analysis)
Total Processing Time             →  <1ms    (combined overhead)

Optimization Features

  • Request context pooling to minimize allocations
  • Pre-compiled pattern database loaded at startup
  • Efficient string building with pre-allocated buffers
  • Lock-free statistics collection where possible
  • Circular buffers for behavioral data (memory efficient)
  • Asynchronous ML model updates (non-blocking)
  • Background pattern analysis with configurable intervals
  • Dynamic rule deduplication and confidence scoring
  • Enhanced session profiling with statistical models
  • Progressive decision logic for optimized blocking

Prerequisites

Installing Vectorscan

macOS

brew install vectorscan

Ubuntu/Debian

sudo apt-get update
sudo apt-get install libvectorscan-dev

From Source

git clone https://github.com/VectorCamp/vectorscan
cd vectorscan
cmake . -DCMAKE_BUILD_TYPE=Release
make
sudo make install

Building and Running

  1. Install dependencies:
go get github.com/flier/gohs/hyperscan
go get github.com/json-iterator/go
  1. Build the WAF:
go build
  1. Run the WAF:
./waf

The WAF will start on the configured port (default: 8080) and begin protecting your backend services. If ML persistence is enabled, it will automatically load any previously saved ML state, providing instant intelligence without retraining.

Configuration

The WAF uses two configuration files with comprehensive ML and security settings:

config.json

Main configuration file containing:

  • Server settings (port, logging, shadow mode)
  • Security settings (API authentication, HTTPS/TLS)
  • Backend configurations
  • Rate limiting settings
  • IP whitelist/blacklist
  • Rules file location
  • Comprehensive ML Configuration (all hardcoded values externalized)

Example:

{
    "listen_port": 8080,
    "enable_https": false,
    "tls_cert_file": "server.crt",
    "tls_key_file": "server.key",
    "api_key": "your-secret-api-key-change-this",
    "admin_endpoints_require_auth": true,
    "backends": [
        {
            "name": "default",
            "url": "http://localhost:8083",
            "path_prefix": "/",
            "enabled": true
        }
    ],
    "rate_limit": 100,
    "rate_limit_window": 60,
    "blocked_ips": [],
    "allowed_ips": [],
    "log_file": "waf.log",
    "enable_logging": true,
    "shadow_mode": false,
    "rules_file": "rules.json",
    "ml_config": {
        "enable_ml_collection": true,
        "ml_data_path": "ml_data",
        "max_logs_in_memory": 10000,
        "feature_sampling_rate": 0.1,
        "anomaly_detection": {
            "enable_anomaly_detection": true,
            "threshold": 0.6,
            "num_trees": 100,
            "subsample_size": 256,
            "min_training_size": 50,
            "retrain_interval_hours": 1,
            "max_training_data": 10000
        },
        "session_profiling": {
            "enable_session_profiling": true,
            "max_sessions": 10000,
            "session_timeout_hours": 24,
            "cleanup_interval_hours": 1,
            "request_history_size": 100,
            "risk_thresholds": {
                "low": 0.3,
                "medium": 0.6,
                "high": 0.8,
                "critical": 0.95
            }
        },
        "behavioral_analysis": {
            "enable_behavioral": true,
            "max_sessions_advanced": 10000,
            "session_timeout_hours": 24,
            "cleanup_interval_hours": 1,
            "violation_history_size": 50,
            "pattern_history_size": 20,
            "confidence_window": 100,
            "entropy_threshold": 7.0,
            "rate_anomaly_threshold": 3.0,
            "temporal_anomaly_multiple": 3.0,
            "request_history_capacity": 1000,
            "statistical_sample_size": 1000,
            "risk_thresholds": {
                "low": 0.3,
                "medium": 0.6,
                "high": 0.8,
                "critical": 0.95
            },
            "critical_z_score": 7.0,
            "high_z_score": 5.0,
            "medium_z_score": 3.0,
            "bot_requests_per_minute": 100,
            "scanner_requests_per_minute": 30,
            "human_requests_per_minute": 5,
            "suspicious_violation_count": 5,
            "malicious_violation_count": 10,
            "base_risk_score": 0.1,
            "anomaly_score_weight": 0.4,
            "violation_risk_weight": 0.1,
            "threat_intel_weight": 0.3,
            "max_violation_risk": 0.3,
            "sql_injection_threat_score": 0.3,
            "xss_threat_score": 0.3,
            "command_injection_score": 0.4
        },
        "pattern_mining": {
            "enable_pattern_mining": true,
            "min_pattern_occurrence": 5,
            "max_patterns": 1000,
            "pattern_confidence_min": 0.7,
            "cluster_threshold": 0.8,
            "ngram_size": 3,
            "promotion_threshold": 0.9,
            "max_false_positives": 5,
            "analysis_interval_minutes": 30,
            "quick_analysis_every": 100,
            "max_requests_stored": 10000,
            "remove_oldest_count": 1000,
            "ngram_min_length": 3,
            "ngram_max_length": 20,
            "ngram_min_frequency": 3,
            "top_ngram_candidates": 20,
            "max_clusters": 50,
            "min_similarity": 0.6,
            "min_cluster_size": 3,
            "emergency_frequency": 5,
            "emergency_confidence": 0.8,
            "min_pattern_length": 5
        },
        "auto_tuning": {
            "enable_auto_tuning": true,
            "tuning_interval_hours": 1,
            "min_samples_for_tuning": 100,
            "min_anomaly_threshold": 0.5,
            "max_anomaly_threshold": 0.95,
            "min_behavioral_threshold": 0.3,
            "max_behavioral_threshold": 0.9,
            "target_false_positive_rate": 0.05,
            "target_recall": 0.95,
            "max_latency_ms": 5.0,
            "threshold_step_size": 0.05,
            "max_adjustment_per_cycle": 0.2,
            "performance_tolerance": 0.02,
            "promotion_accuracy_threshold": 0.95,
            "promotion_min_matches": 50,
            "promotion_min_confidence": 0.9,
            "deprecation_accuracy_threshold": 0.7,
            "deprecation_min_matches": 20,
            "max_feedback_history": 10000,
            "feedback_cleanup_amount": 1000,
            "max_tuning_history": 1000,
            "max_latency_history": 1000,
            "latency_cleanup_amount": 100,
            "default_anomaly_threshold": 0.7,
            "default_behavioral_threshold": 0.6,
            "volume_weight_divisor": 100.0,
            "min_recency_weight": 0.1,
            "recency_decay_days": 30.0,
            "declining_performance_multiple": 1.5,
            "low_recall_threshold": 0.8,
            "low_fpr_threshold": 0.5,
            "recall_adjustment_factor": 0.5,
            "behavioral_fpr_low_threshold": 0.3,
            "min_behavioral_feedback_samples": 20,
            "feedback_rate_hours": 24.0,
            "throughput_recent_count": 100
        },
        "detection_thresholds": {
            "high_confidence_anomaly_threshold": 0.85,
            "critical_risk_threshold": 0.95,
            "combined_anomaly_threshold": 0.75,
            "combined_risk_threshold": 0.8,
            "alert_anomaly_threshold": 0.7,
            "alert_risk_threshold": 0.7,
            "max_behavioral_violations": 3,
            "max_suspicious_patterns": 2
        }
    },
    "persistence_config": {
        "enable_persistence": true,
        "persistence_directory": "./ml_data",
        "auto_save_interval_minutes": 30,
        "backup_retention_days": 7,
        "compress_backups": false
    }
}

Configuration Features

Comprehensive ML Configuration

All machine learning components are now fully configurable:

  • Anomaly Detection: Thresholds, tree counts, training intervals
  • Session Profiling: Timeout settings, risk thresholds, history sizes
  • Behavioral Analysis: 20+ configurable parameters for advanced profiling
  • Pattern Mining: N-gram analysis, clustering, and rule generation settings
  • Auto-Tuning: Feedback processing, performance optimization, and threshold management
  • Detection Thresholds: All ML decision points externalized

Shadow Mode

Test WAF behavior without blocking traffic:

{
    "shadow_mode": true
}

Hot-Reload Support

Update configuration without restart:

curl -X POST http://localhost:8080/_waf/reload

ML State Persistence

Automatic saving and loading of ML data:

  • Training data and models preserved across restarts
  • Dynamic rules and patterns retained
  • Session profiles and behavioral data maintained
  • Feedback history and auto-tuning state saved
  • Configurable auto-save intervals and backup retention

rules.json

Contains all WAF rules for attack detection. All default rules are Vectorscan-optimized:

  • SQL Injection (multiple variants)
  • XSS (Cross-Site Scripting)
  • Command Injection
  • Path Traversal
  • Local/Remote File Inclusion
  • PHP Code Injection
  • Log4Shell (CVE-2021-44228)
  • Null Byte Injection
  • Security Scanner Detection
  • And more...

Example rule:

{
    "id": "SQL_INJECTION_1",
    "name": "SQL Injection Detection",
    "pattern": "(?i)(union\\s+select|select\\s+.*\\s+from|insert\\s+into|update\\s+.*\\s+set|delete\\s+from|drop\\s+table|create\\s+table)",
    "action": "block",
    "enabled": true,
    "description": "Detects common SQL injection patterns"
}

Monitoring

Core WAF Statistics

Access traditional WAF statistics at /_waf/stats:

curl http://localhost:8080/_waf/stats

Returns JSON with:

  • Total requests processed
  • Blocked/allowed request counts
  • Rule match statistics
  • Current rate limit states

Machine Learning Endpoints

Anomaly Detection Statistics

Get detailed ML model performance metrics:

curl http://localhost:8080/_waf/ml/anomaly_stats

Returns:

{
  "is_trained": true,
  "training_samples": 250,
  "total_predictions": 1500,
  "anomalies_detected": 45,
  "false_positives": 3,
  "true_positives": 42,
  "threshold": 0.6,
  "last_retrain": "2024-03-15T14:30:00Z",
  "num_trees": 100,
  "subsample_size": 256
}

Behavioral Analysis Statistics

Monitor session profiling and risk distribution:

curl http://localhost:8080/_waf/ml/session_stats

Returns:

{
  "total_sessions": 85,
  "risk_distribution": {
    "low": 70,
    "medium": 12,
    "high": 2,
    "critical": 1
  },
  "max_sessions": 10000,
  "session_timeout": "24h0m0s",
  "last_cleanup": "2024-03-15T14:00:00Z"
}

ML Data Collection Status

Check overall ML data collection status:

curl http://localhost:8080/_waf/ml/stats

Manual Model Retraining

Trigger immediate ML model retraining:

curl -X POST http://localhost:8080/_waf/ml/model_retrain

False Positive Reporting

Report false positives to improve model accuracy:

curl -X POST http://localhost:8080/_waf/ml/report_false_positive \
  -H "Content-Type: application/json" \
  -d '{"request_id": "123", "was_attack": false, "feedback": "Legitimate admin request"}'

Export Training Data

Download collected ML training data:

curl http://localhost:8080/_waf/ml/export?filename=training_data.json > training_data.json

Pattern Mining & Dynamic Rules

Get discovered dynamic rules and pattern statistics:

curl http://localhost:8080/_waf/ml/dynamic_rules

Returns discovered attack patterns:

{
  "dynamic_rules": {
    "dyn_a1b2c3d4": {
      "id": "dyn_a1b2c3d4",
      "pattern": "(?i)union\\s+select",
      "confidence": 0.85,
      "detection_count": 12,
      "false_positives": 1,
      "source": "ngram",
      "enabled": true,
      "promoted": true
    }
  },
  "total_rules": 15
}

Check pattern mining statistics:

curl http://localhost:8080/_waf/ml/pattern_stats

Returns comprehensive pattern mining metrics:

{
  "total_blocked_requests": 450,
  "total_patterns_found": 28,
  "patterns_promoted": 8,
  "patterns_deprecated": 3,
  "active_dynamic_rules": 15,
  "promoted_rules": 8,
  "ngram_count": 156,
  "cluster_count": 12,
  "false_positive_patterns": 5
}

Manual Rule Promotion

Promote a dynamic rule to permanent status:

curl -X POST http://localhost:8080/_waf/ml/promote_rule \
  -H "Content-Type: application/json" \
  -d '{"pattern": "(?i)union\\s+select"}'

Enhanced Behavioral Analysis

Get comprehensive behavioral analysis statistics:

curl http://localhost:8080/_waf/ml/behavioral_stats

Returns detailed behavioral metrics:

{
  "total_sessions": 1250,
  "active_sessions": 87,
  "risk_distribution": {
    "low": 65,
    "medium": 15,
    "high": 5,
    "critical": 2
  },
  "behavior_violations": 45,
  "patterns_detected": 12,
  "max_sessions": 10000,
  "session_timeout": "24h0m0s"
}

Session Profiling

View detailed session profiles with filtering:

curl http://localhost:8080/_waf/ml/session_profiles?risk_level=high&limit=10

Returns comprehensive session data:

{
  "profiles": {
    "192.168.1.100": {
      "risk_level": "high",
      "risk_score": 0.75,
      "anomaly_score": 0.68,
      "violation_count": 4,
      "suspicious_patterns": 2,
      "session_type": "scanner",
      "behavior_category": "suspicious",
      "threat_score": 0.82
    }
  },
  "total_shown": 5,
  "total_active": 87
}

Behavioral Violations

Monitor behavioral violations with filtering:

curl http://localhost:8080/_waf/ml/violations?severity=high&hours=24

Returns violation details:

{
  "violations": [
    {
      "ip_address": "10.0.0.50",
      "type": "rate_anomaly",
      "severity": "high",
      "description": "Request rate 45.2 deviates significantly from normal 12.3",
      "timestamp": "2024-03-15T16:30:00Z",
      "risk_score": 0.85,
      "evidence": {
        "current_rate": 45.2,
        "normal_rate": 12.3,
        "z_score": 5.8
      }
    }
  ],
  "count": 12
}

Phase 5: Feedback Loop & Auto-Tuning

User Feedback Collection

Submit feedback on ML decisions at /_waf/ml/feedback:

curl -X POST http://localhost:8080/_waf/ml/feedback \
  -H "Content-Type: application/json" \
  -d '{
    "request_id": "req_12345",
    "was_attack": false,
    "attack_type": "sql_injection",
    "user_id": "security_admin"
  }'

Response:

{
  "status": "success",
  "message": "Feedback processed successfully",
  "request_id": "req_12345",
  "timestamp": "2024-03-15T16:30:00Z"
}

Model Performance Metrics

Get comprehensive performance metrics at /_waf/ml/model_performance:

curl http://localhost:8080/_waf/ml/model_performance

Returns detailed performance analysis:

{
  "model_performance": {
    "total_predictions": 15420,
    "accuracy": 0.962,
    "precision": 0.945,
    "recall": 0.978,
    "f1_score": 0.961,
    "false_positive_rate": 0.032,
    "false_negative_rate": 0.022,
    "performance_trend": "improving",
    "anomaly_threshold": 0.72,
    "behavioral_threshold": 0.65,
    "threshold_optimal": true,
    "anomaly_detector_performance": {
      "accuracy": 0.958,
      "avg_latency_ms": 0.85,
      "throughput_rps": 1250.5,
      "calibration_needed": false
    }
  },
  "feedback_summary": {
    "total_feedback": 847,
    "recent_24h": 156,
    "attack_rate": 0.288,
    "tuning_enabled": true,
    "tuning_in_progress": false
  }
}

Auto-Tuning History

View auto-tuning events at /_waf/ml/tuning_history:

curl http://localhost:8080/_waf/ml/tuning_history

Shows tuning decisions:

{
  "tuning_events": [
    {
      "timestamp": "2024-03-15T15:00:00Z",
      "action": "threshold_adjustment",
      "component": "anomaly_detector",
      "old_value": 0.75,
      "new_value": 0.72,
      "reason": "Reducing false positive rate from 0.048 to target 0.050",
      "performance_gain": 0.03
    }
  ],
  "total_events": 23
}

Rule Performance Tracking

Monitor rule effectiveness at /_waf/ml/rule_performance:

# View all rules
curl http://localhost:8080/_waf/ml/rule_performance

# Filter promotion candidates
curl http://localhost:8080/_waf/ml/rule_performance?promotion_eligible=true

# Check deprecation risks
curl http://localhost:8080/_waf/ml/rule_performance?deprecation_risk=true

Returns rule performance data:

{
  "rule_performance": {
    "ML_ANOMALY_HIGH_CONFIDENCE": {
      "accuracy": 0.953,
      "total_matches": 127,
      "true_positives": 121,
      "false_positives": 6,
      "confidence_score": 0.89,
      "promotion_eligible": true,
      "deprecation_risk": false
    }
  },
  "total_rules": 8
}

Auto-Tuning Configuration

Manage auto-tuning settings at /_waf/ml/auto_tuning_config:

# Get current configuration
curl http://localhost:8080/_waf/ml/auto_tuning_config

# Update configuration
curl -X POST http://localhost:8080/_waf/ml/auto_tuning_config \
  -H "Content-Type: application/json" \
  -d '{
    "enable_auto_tuning": true,
    "target_false_positive_rate": 0.03,
    "target_recall": 0.97,
    "tuning_interval": "30m",
    "max_latency_ms": 3.0
  }'

Configuration response:

{
  "config": {
    "enable_auto_tuning": true,
    "tuning_interval": "1h",
    "target_false_positive_rate": 0.05,
    "target_recall": 0.95,
    "max_latency_ms": 5.0,
    "threshold_step_size": 0.05,
    "max_adjustment_per_cycle": 0.2
  }
}

Shadow Mode Management

Control shadow mode via API endpoints:

Shadow Mode Status and Control

# Get shadow mode status
curl http://localhost:8080/_waf/shadow_mode

# Enable shadow mode
curl -X POST http://localhost:8080/_waf/shadow_mode \
  -H "Content-Type: application/json" \
  -d '{"shadow_mode": true}'

# Disable shadow mode  
curl -X POST http://localhost:8080/_waf/shadow_mode \
  -H "Content-Type: application/json" \
  -d '{"shadow_mode": false}'

Shadow mode response:

{
  "shadow_mode": true,
  "status": "active", 
  "timestamp": "2025-09-15T21:16:29Z"
}

Manual Configuration Reload

Trigger a configuration reload at /_waf/reload:

curl -X POST http://localhost:8080/_waf/reload

Shadow Mode

Shadow Mode allows you to test WAF behavior without blocking traffic. This is essential for safe deployment and testing.

How Shadow Mode Works

  • Detection Continues: All rules and ML models remain active
  • Logging Enhanced: Threats logged with SHADOW_MODE_THREAT prefix
  • No Blocking: Malicious requests pass through to backend
  • Full Metrics: All statistics and ML data collection continues

Enable/Disable Shadow Mode

Via Configuration File (Persistent)

{
    "shadow_mode": true,
    "enable_logging": true
}

Via API (Runtime)

# Enable shadow mode (with API key authentication)
curl -X POST http://localhost:8080/_waf/shadow_mode \
    -H "Content-Type: application/json" \
    -H "X-API-Key: your-secret-api-key-change-this" \
    -d '{"shadow_mode": true}'

# Check status (with API key authentication)
curl http://localhost:8080/_waf/shadow_mode \
    -H "X-API-Key: your-secret-api-key-change-this"

# Disable shadow mode (with API key authentication)
curl -X POST http://localhost:8080/_waf/shadow_mode \
    -H "Content-Type: application/json" \
    -H "X-API-Key: your-secret-api-key-change-this" \
    -d '{"shadow_mode": false}'

Shadow Mode Monitoring

# Monitor shadow mode logs
tail -f waf.log | grep SHADOW_MODE

# Count shadow mode detections
grep -c "SHADOW_MODE" waf.log

# Analyze threat types in shadow mode
grep "SHADOW_MODE" waf.log | cut -d: -f3 | sort | uniq -c

Shadow Mode Use Cases

  1. New Rule Testing: Test rules in production without blocking users
  2. ML Model Validation: Validate anomaly detection accuracy
  3. Configuration Tuning: Fine-tune thresholds and parameters
  4. Deployment Verification: Ensure WAF works correctly before enabling blocking

See SHADOW_MODE_GUIDE.md for comprehensive shadow mode documentation.

Enhanced Logging

The WAF logs all requests and blocks to the configured log file (default: waf.log). Log entries include:

  • Timestamp
  • Client IP
  • Request method and URL
  • User agent
  • Action taken (ALLOWED/BLOCKED/ML_ANOMALY/ML_BEHAVIORAL/SHADOW_MODE_THREAT)
  • Rule ID and reason for blocked requests
  • Pattern matching engine used (Vectorscan/Regexp)
  • ML insights: Anomaly scores, risk levels, behavioral alerts
  • Session tracking: Request patterns, risk progression
  • Pattern evolution: Dynamic rule creation, promotion, and deprecation events
  • Shadow mode detection: Threats detected but not blocked

Example log entries:

# Traditional rule block
[2024-03-21 10:30:45] 127.0.0.1 GET http://localhost:8080/test?cmd=shell_exec(ls) Mozilla/5.0 - BLOCKED: COMMAND_INJECTION_2 - Detects command injection function calls

# ML anomaly detection
[2024-03-21 10:31:12] 192.168.1.100 POST http://localhost:8080/api/data Chrome/91.0 - BLOCKED: ML_ANOMALY_HIGH_CONFIDENCE - ML detected anomaly (score: 0.89, risk: high)

# Shadow mode threat detection (not blocked)
[2025-09-15 21:16:29] ::1 GET http://localhost:8080/test?input=<script>alert('xss')</script> curl/8.7.1 - SHADOW_MODE_THREAT: XSS_1 - Detects XSS attack patterns

# Behavioral analysis alert
[2024-03-21 10:31:45] 10.0.0.50 GET http://localhost:8080/admin/config curl/7.68.0 - ML_ALERT: anomaly=0.72 risk=high

# Pattern evolution events
[2024-03-21 10:32:15] ML: Created dynamic rule dyn_a1b2c3d4 (confidence: 0.85, source: ngram)
[2024-03-21 10:35:30] ML: Promoted dynamic rule dyn_a1b2c3d4 to production (confidence: 0.92)

# Enhanced behavioral analysis alerts
[2024-03-21 10:33:00] 10.0.0.75 GET http://localhost:8080/scan curl/7.68.0 - ML_ALERT: anomaly=0.78 risk=high score=0.85 violations=3 patterns=1
[2024-03-21 10:33:15] 192.168.1.50 POST http://localhost:8080/upload Scanner/2.0 - BLOCKED: ML_BEHAVIORAL_VIOLATIONS - Multiple behavioral violations detected
[2024-03-21 10:33:30] ML: High-risk behavior detected for 10.0.0.75 (risk: 0.89, level: critical)

# Normal request with enhanced ML scoring
[2024-03-21 10:32:00] 192.168.1.101 GET http://localhost:8080/dashboard Firefox/89.0 - ALLOWED

Testing Rule Compatibility

Test which rules are compatible with Vectorscan:

go run test_rules_compatibility.go

This will show:

  • Individual rule compatibility status
  • Compilation success/failure for each pattern
  • Overall compatibility percentage
  • Test results against sample attack payloads

Security Features

1. Multi-Layer Attack Detection

Traditional Rule-Based (Layer 1)

  • SQL Injection: Multiple detection patterns including tautologies, comments, and time-based
  • XSS: Script tags, event handlers, and JavaScript protocols
  • Command Injection: Shell operators and function calls
  • Path Traversal: Directory traversal and file access attempts
  • File Inclusion: Both local (LFI) and remote (RFI)
  • PHP Injection: eval(), assert(), and other dangerous functions
  • Log4Shell: CVE-2021-44228 JNDI injection
  • Null Byte: Various null byte encoding attempts

Machine Learning-Based (Layer 2)

  • Zero-Day Detection: Anomaly detection for unknown attack patterns
  • Behavioral Analysis: Unusual request patterns and session profiling
  • Adaptive Learning: Continuous improvement from legitimate traffic
  • Advanced Evasion: Detection of encoded, obfuscated, or polymorphic attacks
  • Statistical Analysis: Entropy-based detection of suspicious payloads

Pattern Evolution System (Layer 3)

  • Dynamic Rule Generation: Automatic creation of new rules from blocked attacks
  • N-gram Analysis: Frequency-based pattern discovery (3-20 character sequences)
  • Attack Clustering: Similarity-based grouping of attack patterns
  • Regex Synthesis: Intelligent conversion of patterns to regex rules
  • False Positive Learning: Continuous improvement through user feedback
  • Rule Promotion: Automatic promotion of high-confidence patterns to production

Enhanced Behavioral Analysis (Layer 4)

  • Comprehensive Session Profiling: 65+ behavioral metrics per session
  • Progressive Risk Scoring: Multi-factor risk assessment with confidence levels
  • Behavioral Violation Detection: Real-time rate, pattern, temporal, and content anomaly detection
  • Session Characterization: Automated human/bot/scanner classification
  • Advanced Statistical Analysis: Z-score analysis and deviation detection
  • Threat Intelligence Integration: IOC matching and reputation scoring
  • Multi-dimensional Pattern Recognition: Behavioral fingerprinting and sequence analysis

2. Access Control & Rate Limiting

  • IP-based whitelisting/blacklisting
  • Rate limiting per IP (configurable window)
  • Dangerous HTTP method blocking (TRACE, TRACK, DEBUG)
  • Security scanner detection via User-Agent
  • ML-Enhanced: Behavioral rate limiting based on session profiling

3. Advanced Request Analysis

  • URL parameters with entropy analysis
  • Request headers with pattern matching
  • Request body inspection (with configurable size limits)
  • User agent filtering and behavioral analysis
  • 24-Dimensional Feature Extraction: Statistical, entropy, behavioral, and temporal features
  • Session Correlation: Cross-request pattern analysis
  • 65+ Behavioral Metrics: Comprehensive session profiling and analysis
  • Real-time Violation Tracking: Behavioral anomaly detection and alerting

4. Intelligent Decision Making

  • Risk-Based Blocking: Graduated response based on confidence levels
  • False Positive Learning: User feedback integration for model improvement
  • Adaptive Thresholds: Self-tuning based on traffic patterns
  • Multi-Factor Scoring: Combined rule + ML + behavioral assessment
  • Dynamic Pattern Discovery: Real-time identification of emerging attack trends
  • Automated Rule Evolution: Self-improving security through pattern learning
  • Progressive Decision Logic: Enhanced blocking with behavioral violation thresholds
  • Session-Aware Blocking: Context-aware decisions based on comprehensive session analysis
  • Advanced Risk Assessment: Multi-dimensional scoring with confidence levels

Performance Benchmarks

Multi-Layer Processing Performance

With 16 active rules + ML analysis on typical web requests:

Processing Layer Time per Request Throughput CPU Usage
Vectorscan Only ~0.05ms 20,000 req/s 15%
Vectorscan + ML ~0.5ms 15,000 req/s 25%
Sequential Regex ~2.5ms 400 req/s 85%

ML Performance Metrics

  • Feature Extraction: ~0.2ms per request
  • Anomaly Detection: ~0.3ms per request (100 trees)
  • Basic Behavioral Analysis: ~0.1ms per request
  • Enhanced Behavioral Analysis: ~0.1ms per request (65+ metrics)
  • Pattern Evolution: ~0.1ms per request (background analysis)
  • Progressive Decision Logic: ~0.05ms per request
  • Model Training: ~50ms for 1000 samples (background)
  • Memory Overhead: ~100MB for 10K sessions + ML models

Memory Usage

  • Startup memory: ~25MB (Vectorscan + ML models)
  • Per-request overhead: <2KB (with ML features + object pooling)
  • Pattern database: ~200KB (Vectorscan rules)
  • ML models: ~10MB (isolation forest + behavioral data)
  • Session data: ~100MB for 10K active sessions
  • Pattern mining: ~50KB for 10K blocked requests + dynamic rules
  • Enhanced behavioral profiles: ~200KB for 1K sessions with 65+ metrics

Architecture

Core Components

  1. WAF Engine: Main request handler and router
  2. Vectorscan Matcher: High-performance pattern matching engine
  3. ML Anomaly Detector: Custom Isolation Forest for anomaly detection
  4. Basic Session Profiler: Behavioral analysis and risk assessment
  5. Advanced Session Profiler: Enhanced 65+ metric behavioral analysis
  6. Feature Extractor: 24-dimensional security feature analysis
  7. Pattern Miner: Dynamic rule generation and pattern evolution
  8. Feedback & Auto-Tuning System: User feedback collection and automatic performance optimization
  9. Behavioral Risk Calculator: Multi-factor risk assessment engine
  10. Rule Manager: Rule compilation and management
  11. Rate Limiter: Token bucket implementation per IP
  12. Backend Proxy: Reverse proxy to backend services
  13. Statistics Collector: Real-time metrics collection (WAF + ML)
  14. Configuration Reloader: Hot-reload support

Multi-Layer Request Flow

  1. Request Processing:

    • Request received → Rate limiting check
    • IP whitelist/blacklist verification
    • Request context extraction and pooling
  2. Layer 1 - Traditional Security:

    • Vectorscan pattern matching (all rules simultaneously)
    • Fallback to regex if needed (rare)
    • Immediate block for known attack patterns
  3. Layer 2 - ML Analysis (if Layer 1 passes):

    • Feature extraction (24 dimensions)
    • Anomaly detection using Isolation Forest
    • Behavioral analysis and session profiling
    • Risk assessment (low → critical)
  4. Layer 2c - Enhanced Behavioral Analysis:

    • Comprehensive 65+ metric behavioral profiling
    • Advanced risk scoring with multi-factor assessment
    • Behavioral violation detection (rate, pattern, temporal, content)
    • Session characterization and threat intelligence
  5. Layer 3 - Progressive Decision Logic:

    • Enhanced blocking criteria with behavioral violations
    • Session-aware decision making
    • Combined risk assessment (anomaly + behavioral + pattern)
    • Advanced threshold management
  6. Layer 4 - Pattern Evolution (background):

    • Blocked request analysis for pattern mining
    • N-gram frequency analysis and clustering
    • Dynamic rule generation and validation
    • Rule promotion and deprecation management
  7. Response Generation:

    • Backend proxy or error response
    • Enhanced logging with comprehensive ML insights
    • Statistics update (traditional + ML + behavioral metrics)
    • Pattern mining and behavioral data collection

Troubleshooting

Vectorscan Compilation Errors

If you see "failed to compile Vectorscan patterns" in logs:

  • Ensure Vectorscan library is properly installed
  • Check ldconfig includes Vectorscan library path
  • Verify CPU supports SSSE3 instructions (required for Vectorscan)

Building Issues

# If you get "hs/hs.h not found"
export CGO_CFLAGS="-I/usr/local/include"
export CGO_LDFLAGS="-L/usr/local/lib -lhs"
go build

Performance Tuning

  • Multi-core systems: Increase GOMAXPROCS for better ML performance
  • Rate limiting: Adjust windows based on traffic patterns
  • ML thresholds: Lower anomaly threshold (0.4-0.5) for more sensitive detection
  • Session limits: Reduce maxSessions if memory constrained
  • Training frequency: Increase retrainInterval for high-traffic sites
  • Pattern mining: Adjust analysisInterval based on attack frequency
  • Load balancing: Consider multiple WAF instances with shared ML training data

Contributing

Feel free to submit issues and enhancement requests!

When contributing:

  1. Traditional Rules: Ensure new rules are Vectorscan-compatible when possible
  2. Compatibility: Run go run test_rules_compatibility.go for pattern validation
  3. ML Features: Test ML components with ./test_phase2_ml.sh
  4. Pattern Mining: Test pattern evolution with ./test_phase3_patterns.sh
  5. Enhanced Behavioral Analysis: Test advanced profiling with ./test_phase4_behavioral.sh
  6. Feedback & Auto-Tuning: Test feedback loops with ./test_phase5_feedback.sh
  7. Performance: Include benchmarks for performance-critical changes
  8. ML Models: Consider impact on training data and model accuracy
  9. Documentation: Update README and implementation docs for new features

ML Development Guidelines

  • Maintain backward compatibility with existing rule-based detection
  • Ensure ML enhancements don't impact sub-millisecond response times
  • Test anomaly detection with various attack types and evasion techniques
  • Validate behavioral analysis with legitimate user patterns
  • Include false positive analysis for ML-based blocking decisions
  • Test pattern mining with diverse attack patterns and clustering scenarios
  • Validate dynamic rule generation accuracy and promotion criteria

Testing Your Changes

# Build and test basic functionality
go build && ./waf &

# Test traditional rules and Vectorscan compatibility
go run test_rules_compatibility.go

# Test ML features and endpoints
./test_phase2_ml.sh

# Test pattern mining and dynamic rule generation
./test_phase3_patterns.sh

# Test enhanced behavioral analysis
./test_phase4_behavioral.sh

# Test feedback and auto-tuning
./test_phase5_feedback.sh

# Test comprehensive configuration management
./test_phases3_4_config.sh
./test_phase5_config.sh

# Test shadow mode functionality
./test_shadow_mode.sh

# Test ML persistence functionality
./test_persistence.sh

# Performance testing
curl http://localhost:8080/_waf/stats
curl http://localhost:8080/_waf/ml/anomaly_stats
curl http://localhost:8080/_waf/ml/dynamic_rules
curl http://localhost:8080/_waf/ml/pattern_stats
curl http://localhost:8080/_waf/ml/behavioral_stats
curl http://localhost:8080/_waf/ml/session_profiles
curl http://localhost:8080/_waf/ml/violations

# Phase 5: Feedback and auto-tuning testing
curl http://localhost:8080/_waf/ml/model_performance
curl http://localhost:8080/_waf/ml/tuning_history
curl http://localhost:8080/_waf/ml/rule_performance
curl http://localhost:8080/_waf/ml/auto_tuning_config

# Shadow mode testing
curl http://localhost:8080/_waf/shadow_mode
curl -X POST http://localhost:8080/_waf/shadow_mode -d '{"shadow_mode": true}'
curl -X POST http://localhost:8080/_waf/reload

Latest Updates

ML State Persistence

The WAF now features automatic ML state persistence:

  • Zero Data Loss: All ML training and learning preserved across restarts
  • Automatic Save/Load: ML state automatically saved on shutdown, loaded on startup
  • Configurable Auto-Save: Periodic saving with customizable intervals
  • Backup Management: Automatic backup creation with retention policies
  • Production Ready: Instant ML capability on startup without retraining

Comprehensive Configuration Management

The WAF features complete configuration externalization:

  • Zero Hardcoded Values: All ML parameters moved to config.json
  • Phase 3 Configuration: 13 pattern mining parameters fully configurable
  • Phase 4 Configuration: 20+ behavioral analysis parameters configurable
  • Phase 5 Configuration: 18 feedback & auto-tuning parameters configurable
  • Persistence Configuration: 5 persistence management parameters
  • Hot-Reload Support: All configurations can be updated without restart
  • Environment-Specific: Different configs for production, development, testing

Shadow Mode Implementation

Essential for safe WAF deployment:

  • API Control: Enable/disable shadow mode via REST endpoints
  • Comprehensive Logging: Enhanced logs with SHADOW_MODE_THREAT detection
  • Zero Impact: Full detection without blocking traffic
  • Production Ready: Safe testing of rules and ML models in live environments

Security Configuration

API Key Authentication

The WAF supports API key authentication for admin endpoints to secure management operations.

Configuration

{
    "api_key": "your-secret-api-key-change-this",
    "admin_endpoints_require_auth": true
}

Usage

Include the X-API-Key header with all admin requests:

# Access statistics
curl http://localhost:8080/_waf/stats \
  -H "X-API-Key: your-secret-api-key-change-this"

# ML model retraining
curl -X POST http://localhost:8080/_waf/ml/model_retrain \
  -H "X-API-Key: your-secret-api-key-change-this"

# Configuration reload
curl -X POST http://localhost:8080/_waf/reload \
  -H "X-API-Key: your-secret-api-key-change-this"

HTTPS/TLS Support

The WAF supports HTTPS for secure communications and encrypted admin access.

Configuration

{
    "enable_https": true,
    "tls_cert_file": "server.crt",
    "tls_key_file": "server.key"
}

Certificate Generation

Create self-signed certificates for testing:

openssl req -x509 -newkey rsa:2048 -keyout server.key -out server.crt -sha256 -days 365 -nodes -subj '/CN=localhost'

HTTPS Usage

Access the WAF via HTTPS:

# Statistics (HTTPS + API key)
curl https://localhost:8080/_waf/stats \
  -H "X-API-Key: your-secret-api-key-change-this" \
  -k  # Use -k for self-signed certificates

# Test protected endpoint
curl https://localhost:8080/test \
  -k  # Use -k for self-signed certificates

Security Best Practices

  1. Change Default API Key: Always replace the default API key with a strong, unique value
  2. Use HTTPS in Production: Enable HTTPS for all production deployments
  3. Certificate Management: Use proper certificates from a trusted CA in production
  4. Key Rotation: Regularly rotate API keys and certificates
  5. Access Control: Restrict access to admin endpoints at the network level when possible

Rules Database Management

The WAF supports both JSON-based rules (legacy) and SQLite database-based rules (recommended) for enhanced management capabilities.

Database Configuration

Enable the rules database in your config.json:

{
  "rules_db_config": {
    "enable_database": true,
    "database_path": "./waf_rules.db",
    "migrate_from_json": true,
    "json_backup_on_migration": true
  }
}

Migration from JSON

Automatic Migration

When migrate_from_json is enabled, the WAF will automatically migrate rules from your existing rules.json to the database on first startup if the database is empty.

Manual Migration

Use the migration utility for more control:

# Build the migration utility
go build -o migrate-rules ./cmd/migrate_rules.go

# Migrate with options
./migrate-rules -json rules.json -db waf_rules.db -backup

# Options:
#   -json: Path to JSON rules file (default: rules.json)
#   -db: Path to SQLite database (default: waf_rules.db)
#   -backup: Create backup of JSON file (default: true)
#   -overwrite: Overwrite existing database (default: false)

Rules API Endpoints

The WAF provides a comprehensive REST API for rule management:

List Rules

# List all rules with pagination
curl -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8080/_waf/rules?limit=20&offset=0"

# Filter rules
curl -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8080/_waf/rules?enabled=true&action=block"

Get Rule Details

curl -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8080/_waf/rules/SQL_INJECTION_1"

Create New Rule

curl -X POST -H "Content-Type: application/json" \
  -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8080/_waf/rules" \
  -d '{
    "rule_id": "CUSTOM_RULE_1",
    "name": "Custom Attack Detection",
    "pattern": "(?i)(malicious|evil)",
    "action": "block",
    "enabled": true,
    "description": "Detects malicious patterns"
  }'

Update Existing Rule

curl -X PUT -H "Content-Type: application/json" \
  -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8080/_waf/rules/CUSTOM_RULE_1" \
  -d '{
    "name": "Updated Custom Attack Detection",
    "pattern": "(?i)(malicious|evil|harmful)",
    "action": "block",
    "enabled": true,
    "description": "Updated to detect more attack patterns"
  }'

Delete Rule

# Soft delete (disable rule)
curl -X DELETE -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8080/_waf/rules/CUSTOM_RULE_1"

# Hard delete (permanently remove)
curl -X DELETE -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8080/_waf/rules/CUSTOM_RULE_1?hard=true"

Rule History

# View rule change history
curl -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8080/_waf/rules/SQL_INJECTION_1/history?limit=10"

Database Statistics

# Get rules database statistics
curl -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8080/_waf/rules/stats"

Export/Import Rules

# Export rules to JSON
curl -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8080/_waf/rules/export" > exported_rules.json

# Import rules from JSON
curl -X POST -H "Content-Type: application/json" \
  -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8080/_waf/rules/import" \
  -d @exported_rules.json

Reload Rules

# Reload rules from database into memory
curl -X POST -H "X-API-Key: your-secret-api-key-change-this" \
  "http://localhost:8080/_waf/rules/reload"

Database Features

Rule Versioning

  • Every rule change creates a new version
  • Complete audit trail of all modifications
  • History includes CREATE, UPDATE, DELETE operations
  • Timestamps and operation tracking

Rule History Schema

CREATE TABLE rule_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    rule_id TEXT NOT NULL,
    name TEXT NOT NULL,
    pattern TEXT NOT NULL,
    action TEXT NOT NULL,
    enabled BOOLEAN NOT NULL,
    description TEXT DEFAULT '',
    version INTEGER NOT NULL,
    operation TEXT NOT NULL, -- 'CREATE', 'UPDATE', 'DELETE'
    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);

Database Triggers

  • Automatic timestamp updates
  • Version incrementing on changes
  • History logging for all operations
  • Data integrity enforcement

Performance Benefits

Database vs JSON

  • Dynamic Updates: No WAF restart required for rule changes
  • Concurrent Access: Multiple administrators can manage rules safely
  • Query Performance: Fast filtering and searching with SQL indexes
  • Data Integrity: ACID transactions ensure consistency
  • Scalability: Handle thousands of rules efficiently
  • Audit Trail: Complete change history for compliance

Hot Reload

  • Rules changes take effect immediately
  • No service interruption
  • Automatic pattern recompilation
  • Memory-efficient rule loading

Best Practices

Rule Management

  1. Use Descriptive Names: Clear rule names improve maintainability
  2. Version Control Integration: Export rules regularly for backup
  3. Test Before Production: Use shadow mode to validate new rules
  4. Monitor Performance: Check rule match statistics regularly
  5. Regular Cleanup: Remove obsolete or ineffective rules

Database Maintenance

  1. Regular Backups: Export rules database periodically
  2. History Cleanup: Archive old history records if needed
  3. Index Optimization: Monitor database performance
  4. Storage Planning: Plan for rule history growth

This WAF is production-ready with enterprise-grade configuration management, ML persistence, secure administration, dynamic rules database, and safe deployment capabilities.

About

Web Application FIrewall

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published