A high-performance Web Application Firewall written in Go, featuring Vectorscan/Hyperscan integration for blazing-fast pattern matching and advanced machine learning for adaptive threat detection.
- Vectorscan/Hyperscan Integration: Hardware-accelerated pattern matching using SIMD instructions
- Parallel Pattern Matching: All security rules evaluated simultaneously in a single pass
- High Performance: 10-100x faster than traditional regex-based WAFs
- Rule-Based Protection: Configurable rules to detect and block various attack patterns
- Rate Limiting: Protect against brute force and DoS attacks
- IP Whitelist/Blacklist: Control access based on IP addresses
- Multiple Backend Support: Route requests to different backend servers
- Anomaly Detection: Custom Isolation Forest implementation for zero-day attack detection
- Enhanced Behavioral Analysis: Comprehensive 65+ metric session profiling with advanced risk assessment
- Multi-Layer Detection: Smart combination of rules + ML + behavioral + pattern analysis
- Learning Pipeline: Automatic model training from legitimate traffic patterns
- ML State Persistence: Automatic saving/loading of all ML data, models, and learning across restarts
- Adaptive Thresholds: Self-tuning based on false positive feedback
- Advanced Feature Extraction: 24-dimensional security feature analysis
- Pattern Evolution System: Automatic discovery and generation of new attack patterns
- Dynamic Rule Generation: N-gram analysis and clustering for emerging threat detection
- False Positive Learning: Self-improving rule accuracy through user feedback
- Progressive Risk Scoring: Multi-factor risk assessment with confidence scoring
- Behavioral Violation Detection: Real-time detection of rate, pattern, temporal, and content anomalies
- Session Characterization: Automated classification of human/bot/scanner behavior
- Feedback Loop System: User-driven model improvement with attack validation and false positive correction
- Auto-Tuning Engine: Automatic threshold optimization based on performance metrics and user feedback
- Rule Performance Management: Dynamic promotion/deprecation of rules based on accuracy and effectiveness
- Comprehensive Performance Tracking: Multi-component metrics with trend analysis and calibration alerts
- Persistent Intelligence: All ML knowledge preserved across WAF restarts with configurable backup management
- API Key Authentication: X-API-Key header required for admin endpoint access
- HTTPS/TLS Support: Full TLS encryption for admin interface and traffic
- Certificate Management: Self-signed or custom certificate support
- Secure Admin Endpoints: All /_waf/* endpoints require authentication when enabled
- Configuration-based Security: Enable/disable authentication and HTTPS via config
- Rules Database: SQLite database for dynamic rule management with full CRUD API
- Rule History & Versioning: Complete audit trail of all rule changes with rollback support
- Database Migration: Seamless migration from JSON rules to SQLite database
- Real-time Logging: Detailed request logging with attack detection and ML insights
- Configuration Hot-Reload: Update rules and settings without restart
- Shadow Mode: Test WAF behavior without blocking traffic (essential for safe deployment)
- ML State Persistence: Automatic saving/loading of ML data across restarts
- Comprehensive Configuration Management: All ML parameters externalized and configurable
- Comprehensive Statistics: WAF performance, ML model metrics, and behavioral analytics
- ML Management Endpoints: Model retraining, false positive reporting, anomaly statistics
- Pattern Mining Dashboard: Dynamic rule discovery, promotion, and management (not fully implemented)
- Enhanced Behavioral Dashboard: Session profiling, violation tracking, and risk analysis (not fully implemented)
- Feedback & Auto-Tuning Dashboard: User feedback collection, performance monitoring, and automatic optimization (not fully implmeneted)
- Advanced Filtering: IP-based, risk-level, and severity-based filtering capabilities
- Intelligent Fallback: Automatic fallback to standard regex for incompatible patterns
- Object Pooling: Optimized memory usage with request context and buffer pools
- Environment-Specific Tuning: Production/development configurations with hot-reload capability
- Automatic Backup Management: Configurable backup retention and cleanup for ML state
The WAF uses Vectorscan (a portable fork of Intel's Hyperscan) for ultra-fast pattern matching:
- Single-pass matching: All security rules are evaluated simultaneously
- SIMD optimization: Hardware acceleration using CPU vector instructions
- 100% rule compatibility: All default rules are Vectorscan-compatible
- Benchmark results: 10-100x faster than sequential regex matching for typical payloads
- <1ms ML overhead: Feature extraction and anomaly detection per request
- Real-time analysis: Non-blocking ML processing pipeline
- Memory efficient: 10,000 concurrent session tracking with <100MB overhead
- Automatic optimization: Hourly model retraining with sliding window data
- Scalable: Linear performance scaling with request volume
Layer 1: Vectorscan Rules → ~0.05ms (immediate block)
Layer 2a: ML Anomaly Detection → ~0.3ms (scoring)
Layer 2b: Basic Behavioral Analysis → ~0.2ms (risk assessment)
Layer 2c: Enhanced Behavioral → ~0.1ms (advanced profiling)
Layer 3: Progressive Decision → ~0.05ms (smart blocking)
Layer 4: Pattern Evolution → ~0.1ms (background analysis)
Total Processing Time → <1ms (combined overhead)
- Request context pooling to minimize allocations
- Pre-compiled pattern database loaded at startup
- Efficient string building with pre-allocated buffers
- Lock-free statistics collection where possible
- Circular buffers for behavioral data (memory efficient)
- Asynchronous ML model updates (non-blocking)
- Background pattern analysis with configurable intervals
- Dynamic rule deduplication and confidence scoring
- Enhanced session profiling with statistical models
- Progressive decision logic for optimized blocking
brew install vectorscansudo apt-get update
sudo apt-get install libvectorscan-devgit clone https://github.com/VectorCamp/vectorscan
cd vectorscan
cmake . -DCMAKE_BUILD_TYPE=Release
make
sudo make install- Install dependencies:
go get github.com/flier/gohs/hyperscan
go get github.com/json-iterator/go- Build the WAF:
go build- Run the WAF:
./wafThe WAF will start on the configured port (default: 8080) and begin protecting your backend services. If ML persistence is enabled, it will automatically load any previously saved ML state, providing instant intelligence without retraining.
The WAF uses two configuration files with comprehensive ML and security settings:
Main configuration file containing:
- Server settings (port, logging, shadow mode)
- Security settings (API authentication, HTTPS/TLS)
- Backend configurations
- Rate limiting settings
- IP whitelist/blacklist
- Rules file location
- Comprehensive ML Configuration (all hardcoded values externalized)
Example:
{
"listen_port": 8080,
"enable_https": false,
"tls_cert_file": "server.crt",
"tls_key_file": "server.key",
"api_key": "your-secret-api-key-change-this",
"admin_endpoints_require_auth": true,
"backends": [
{
"name": "default",
"url": "http://localhost:8083",
"path_prefix": "/",
"enabled": true
}
],
"rate_limit": 100,
"rate_limit_window": 60,
"blocked_ips": [],
"allowed_ips": [],
"log_file": "waf.log",
"enable_logging": true,
"shadow_mode": false,
"rules_file": "rules.json",
"ml_config": {
"enable_ml_collection": true,
"ml_data_path": "ml_data",
"max_logs_in_memory": 10000,
"feature_sampling_rate": 0.1,
"anomaly_detection": {
"enable_anomaly_detection": true,
"threshold": 0.6,
"num_trees": 100,
"subsample_size": 256,
"min_training_size": 50,
"retrain_interval_hours": 1,
"max_training_data": 10000
},
"session_profiling": {
"enable_session_profiling": true,
"max_sessions": 10000,
"session_timeout_hours": 24,
"cleanup_interval_hours": 1,
"request_history_size": 100,
"risk_thresholds": {
"low": 0.3,
"medium": 0.6,
"high": 0.8,
"critical": 0.95
}
},
"behavioral_analysis": {
"enable_behavioral": true,
"max_sessions_advanced": 10000,
"session_timeout_hours": 24,
"cleanup_interval_hours": 1,
"violation_history_size": 50,
"pattern_history_size": 20,
"confidence_window": 100,
"entropy_threshold": 7.0,
"rate_anomaly_threshold": 3.0,
"temporal_anomaly_multiple": 3.0,
"request_history_capacity": 1000,
"statistical_sample_size": 1000,
"risk_thresholds": {
"low": 0.3,
"medium": 0.6,
"high": 0.8,
"critical": 0.95
},
"critical_z_score": 7.0,
"high_z_score": 5.0,
"medium_z_score": 3.0,
"bot_requests_per_minute": 100,
"scanner_requests_per_minute": 30,
"human_requests_per_minute": 5,
"suspicious_violation_count": 5,
"malicious_violation_count": 10,
"base_risk_score": 0.1,
"anomaly_score_weight": 0.4,
"violation_risk_weight": 0.1,
"threat_intel_weight": 0.3,
"max_violation_risk": 0.3,
"sql_injection_threat_score": 0.3,
"xss_threat_score": 0.3,
"command_injection_score": 0.4
},
"pattern_mining": {
"enable_pattern_mining": true,
"min_pattern_occurrence": 5,
"max_patterns": 1000,
"pattern_confidence_min": 0.7,
"cluster_threshold": 0.8,
"ngram_size": 3,
"promotion_threshold": 0.9,
"max_false_positives": 5,
"analysis_interval_minutes": 30,
"quick_analysis_every": 100,
"max_requests_stored": 10000,
"remove_oldest_count": 1000,
"ngram_min_length": 3,
"ngram_max_length": 20,
"ngram_min_frequency": 3,
"top_ngram_candidates": 20,
"max_clusters": 50,
"min_similarity": 0.6,
"min_cluster_size": 3,
"emergency_frequency": 5,
"emergency_confidence": 0.8,
"min_pattern_length": 5
},
"auto_tuning": {
"enable_auto_tuning": true,
"tuning_interval_hours": 1,
"min_samples_for_tuning": 100,
"min_anomaly_threshold": 0.5,
"max_anomaly_threshold": 0.95,
"min_behavioral_threshold": 0.3,
"max_behavioral_threshold": 0.9,
"target_false_positive_rate": 0.05,
"target_recall": 0.95,
"max_latency_ms": 5.0,
"threshold_step_size": 0.05,
"max_adjustment_per_cycle": 0.2,
"performance_tolerance": 0.02,
"promotion_accuracy_threshold": 0.95,
"promotion_min_matches": 50,
"promotion_min_confidence": 0.9,
"deprecation_accuracy_threshold": 0.7,
"deprecation_min_matches": 20,
"max_feedback_history": 10000,
"feedback_cleanup_amount": 1000,
"max_tuning_history": 1000,
"max_latency_history": 1000,
"latency_cleanup_amount": 100,
"default_anomaly_threshold": 0.7,
"default_behavioral_threshold": 0.6,
"volume_weight_divisor": 100.0,
"min_recency_weight": 0.1,
"recency_decay_days": 30.0,
"declining_performance_multiple": 1.5,
"low_recall_threshold": 0.8,
"low_fpr_threshold": 0.5,
"recall_adjustment_factor": 0.5,
"behavioral_fpr_low_threshold": 0.3,
"min_behavioral_feedback_samples": 20,
"feedback_rate_hours": 24.0,
"throughput_recent_count": 100
},
"detection_thresholds": {
"high_confidence_anomaly_threshold": 0.85,
"critical_risk_threshold": 0.95,
"combined_anomaly_threshold": 0.75,
"combined_risk_threshold": 0.8,
"alert_anomaly_threshold": 0.7,
"alert_risk_threshold": 0.7,
"max_behavioral_violations": 3,
"max_suspicious_patterns": 2
}
},
"persistence_config": {
"enable_persistence": true,
"persistence_directory": "./ml_data",
"auto_save_interval_minutes": 30,
"backup_retention_days": 7,
"compress_backups": false
}
}All machine learning components are now fully configurable:
- Anomaly Detection: Thresholds, tree counts, training intervals
- Session Profiling: Timeout settings, risk thresholds, history sizes
- Behavioral Analysis: 20+ configurable parameters for advanced profiling
- Pattern Mining: N-gram analysis, clustering, and rule generation settings
- Auto-Tuning: Feedback processing, performance optimization, and threshold management
- Detection Thresholds: All ML decision points externalized
Test WAF behavior without blocking traffic:
{
"shadow_mode": true
}Update configuration without restart:
curl -X POST http://localhost:8080/_waf/reloadAutomatic saving and loading of ML data:
- Training data and models preserved across restarts
- Dynamic rules and patterns retained
- Session profiles and behavioral data maintained
- Feedback history and auto-tuning state saved
- Configurable auto-save intervals and backup retention
Contains all WAF rules for attack detection. All default rules are Vectorscan-optimized:
- SQL Injection (multiple variants)
- XSS (Cross-Site Scripting)
- Command Injection
- Path Traversal
- Local/Remote File Inclusion
- PHP Code Injection
- Log4Shell (CVE-2021-44228)
- Null Byte Injection
- Security Scanner Detection
- And more...
Example rule:
{
"id": "SQL_INJECTION_1",
"name": "SQL Injection Detection",
"pattern": "(?i)(union\\s+select|select\\s+.*\\s+from|insert\\s+into|update\\s+.*\\s+set|delete\\s+from|drop\\s+table|create\\s+table)",
"action": "block",
"enabled": true,
"description": "Detects common SQL injection patterns"
}Access traditional WAF statistics at /_waf/stats:
curl http://localhost:8080/_waf/statsReturns JSON with:
- Total requests processed
- Blocked/allowed request counts
- Rule match statistics
- Current rate limit states
Get detailed ML model performance metrics:
curl http://localhost:8080/_waf/ml/anomaly_statsReturns:
{
"is_trained": true,
"training_samples": 250,
"total_predictions": 1500,
"anomalies_detected": 45,
"false_positives": 3,
"true_positives": 42,
"threshold": 0.6,
"last_retrain": "2024-03-15T14:30:00Z",
"num_trees": 100,
"subsample_size": 256
}Monitor session profiling and risk distribution:
curl http://localhost:8080/_waf/ml/session_statsReturns:
{
"total_sessions": 85,
"risk_distribution": {
"low": 70,
"medium": 12,
"high": 2,
"critical": 1
},
"max_sessions": 10000,
"session_timeout": "24h0m0s",
"last_cleanup": "2024-03-15T14:00:00Z"
}Check overall ML data collection status:
curl http://localhost:8080/_waf/ml/statsTrigger immediate ML model retraining:
curl -X POST http://localhost:8080/_waf/ml/model_retrainReport false positives to improve model accuracy:
curl -X POST http://localhost:8080/_waf/ml/report_false_positive \
-H "Content-Type: application/json" \
-d '{"request_id": "123", "was_attack": false, "feedback": "Legitimate admin request"}'Download collected ML training data:
curl http://localhost:8080/_waf/ml/export?filename=training_data.json > training_data.jsonGet discovered dynamic rules and pattern statistics:
curl http://localhost:8080/_waf/ml/dynamic_rulesReturns discovered attack patterns:
{
"dynamic_rules": {
"dyn_a1b2c3d4": {
"id": "dyn_a1b2c3d4",
"pattern": "(?i)union\\s+select",
"confidence": 0.85,
"detection_count": 12,
"false_positives": 1,
"source": "ngram",
"enabled": true,
"promoted": true
}
},
"total_rules": 15
}Check pattern mining statistics:
curl http://localhost:8080/_waf/ml/pattern_statsReturns comprehensive pattern mining metrics:
{
"total_blocked_requests": 450,
"total_patterns_found": 28,
"patterns_promoted": 8,
"patterns_deprecated": 3,
"active_dynamic_rules": 15,
"promoted_rules": 8,
"ngram_count": 156,
"cluster_count": 12,
"false_positive_patterns": 5
}Promote a dynamic rule to permanent status:
curl -X POST http://localhost:8080/_waf/ml/promote_rule \
-H "Content-Type: application/json" \
-d '{"pattern": "(?i)union\\s+select"}'Get comprehensive behavioral analysis statistics:
curl http://localhost:8080/_waf/ml/behavioral_statsReturns detailed behavioral metrics:
{
"total_sessions": 1250,
"active_sessions": 87,
"risk_distribution": {
"low": 65,
"medium": 15,
"high": 5,
"critical": 2
},
"behavior_violations": 45,
"patterns_detected": 12,
"max_sessions": 10000,
"session_timeout": "24h0m0s"
}View detailed session profiles with filtering:
curl http://localhost:8080/_waf/ml/session_profiles?risk_level=high&limit=10Returns comprehensive session data:
{
"profiles": {
"192.168.1.100": {
"risk_level": "high",
"risk_score": 0.75,
"anomaly_score": 0.68,
"violation_count": 4,
"suspicious_patterns": 2,
"session_type": "scanner",
"behavior_category": "suspicious",
"threat_score": 0.82
}
},
"total_shown": 5,
"total_active": 87
}Monitor behavioral violations with filtering:
curl http://localhost:8080/_waf/ml/violations?severity=high&hours=24Returns violation details:
{
"violations": [
{
"ip_address": "10.0.0.50",
"type": "rate_anomaly",
"severity": "high",
"description": "Request rate 45.2 deviates significantly from normal 12.3",
"timestamp": "2024-03-15T16:30:00Z",
"risk_score": 0.85,
"evidence": {
"current_rate": 45.2,
"normal_rate": 12.3,
"z_score": 5.8
}
}
],
"count": 12
}Submit feedback on ML decisions at /_waf/ml/feedback:
curl -X POST http://localhost:8080/_waf/ml/feedback \
-H "Content-Type: application/json" \
-d '{
"request_id": "req_12345",
"was_attack": false,
"attack_type": "sql_injection",
"user_id": "security_admin"
}'Response:
{
"status": "success",
"message": "Feedback processed successfully",
"request_id": "req_12345",
"timestamp": "2024-03-15T16:30:00Z"
}Get comprehensive performance metrics at /_waf/ml/model_performance:
curl http://localhost:8080/_waf/ml/model_performanceReturns detailed performance analysis:
{
"model_performance": {
"total_predictions": 15420,
"accuracy": 0.962,
"precision": 0.945,
"recall": 0.978,
"f1_score": 0.961,
"false_positive_rate": 0.032,
"false_negative_rate": 0.022,
"performance_trend": "improving",
"anomaly_threshold": 0.72,
"behavioral_threshold": 0.65,
"threshold_optimal": true,
"anomaly_detector_performance": {
"accuracy": 0.958,
"avg_latency_ms": 0.85,
"throughput_rps": 1250.5,
"calibration_needed": false
}
},
"feedback_summary": {
"total_feedback": 847,
"recent_24h": 156,
"attack_rate": 0.288,
"tuning_enabled": true,
"tuning_in_progress": false
}
}View auto-tuning events at /_waf/ml/tuning_history:
curl http://localhost:8080/_waf/ml/tuning_historyShows tuning decisions:
{
"tuning_events": [
{
"timestamp": "2024-03-15T15:00:00Z",
"action": "threshold_adjustment",
"component": "anomaly_detector",
"old_value": 0.75,
"new_value": 0.72,
"reason": "Reducing false positive rate from 0.048 to target 0.050",
"performance_gain": 0.03
}
],
"total_events": 23
}Monitor rule effectiveness at /_waf/ml/rule_performance:
# View all rules
curl http://localhost:8080/_waf/ml/rule_performance
# Filter promotion candidates
curl http://localhost:8080/_waf/ml/rule_performance?promotion_eligible=true
# Check deprecation risks
curl http://localhost:8080/_waf/ml/rule_performance?deprecation_risk=trueReturns rule performance data:
{
"rule_performance": {
"ML_ANOMALY_HIGH_CONFIDENCE": {
"accuracy": 0.953,
"total_matches": 127,
"true_positives": 121,
"false_positives": 6,
"confidence_score": 0.89,
"promotion_eligible": true,
"deprecation_risk": false
}
},
"total_rules": 8
}Manage auto-tuning settings at /_waf/ml/auto_tuning_config:
# Get current configuration
curl http://localhost:8080/_waf/ml/auto_tuning_config
# Update configuration
curl -X POST http://localhost:8080/_waf/ml/auto_tuning_config \
-H "Content-Type: application/json" \
-d '{
"enable_auto_tuning": true,
"target_false_positive_rate": 0.03,
"target_recall": 0.97,
"tuning_interval": "30m",
"max_latency_ms": 3.0
}'Configuration response:
{
"config": {
"enable_auto_tuning": true,
"tuning_interval": "1h",
"target_false_positive_rate": 0.05,
"target_recall": 0.95,
"max_latency_ms": 5.0,
"threshold_step_size": 0.05,
"max_adjustment_per_cycle": 0.2
}
}Control shadow mode via API endpoints:
# Get shadow mode status
curl http://localhost:8080/_waf/shadow_mode
# Enable shadow mode
curl -X POST http://localhost:8080/_waf/shadow_mode \
-H "Content-Type: application/json" \
-d '{"shadow_mode": true}'
# Disable shadow mode
curl -X POST http://localhost:8080/_waf/shadow_mode \
-H "Content-Type: application/json" \
-d '{"shadow_mode": false}'Shadow mode response:
{
"shadow_mode": true,
"status": "active",
"timestamp": "2025-09-15T21:16:29Z"
}Trigger a configuration reload at /_waf/reload:
curl -X POST http://localhost:8080/_waf/reloadShadow Mode allows you to test WAF behavior without blocking traffic. This is essential for safe deployment and testing.
- Detection Continues: All rules and ML models remain active
- Logging Enhanced: Threats logged with
SHADOW_MODE_THREATprefix - No Blocking: Malicious requests pass through to backend
- Full Metrics: All statistics and ML data collection continues
{
"shadow_mode": true,
"enable_logging": true
}# Enable shadow mode (with API key authentication)
curl -X POST http://localhost:8080/_waf/shadow_mode \
-H "Content-Type: application/json" \
-H "X-API-Key: your-secret-api-key-change-this" \
-d '{"shadow_mode": true}'
# Check status (with API key authentication)
curl http://localhost:8080/_waf/shadow_mode \
-H "X-API-Key: your-secret-api-key-change-this"
# Disable shadow mode (with API key authentication)
curl -X POST http://localhost:8080/_waf/shadow_mode \
-H "Content-Type: application/json" \
-H "X-API-Key: your-secret-api-key-change-this" \
-d '{"shadow_mode": false}'# Monitor shadow mode logs
tail -f waf.log | grep SHADOW_MODE
# Count shadow mode detections
grep -c "SHADOW_MODE" waf.log
# Analyze threat types in shadow mode
grep "SHADOW_MODE" waf.log | cut -d: -f3 | sort | uniq -c- New Rule Testing: Test rules in production without blocking users
- ML Model Validation: Validate anomaly detection accuracy
- Configuration Tuning: Fine-tune thresholds and parameters
- Deployment Verification: Ensure WAF works correctly before enabling blocking
See SHADOW_MODE_GUIDE.md for comprehensive shadow mode documentation.
The WAF logs all requests and blocks to the configured log file (default: waf.log). Log entries include:
- Timestamp
- Client IP
- Request method and URL
- User agent
- Action taken (ALLOWED/BLOCKED/ML_ANOMALY/ML_BEHAVIORAL/SHADOW_MODE_THREAT)
- Rule ID and reason for blocked requests
- Pattern matching engine used (Vectorscan/Regexp)
- ML insights: Anomaly scores, risk levels, behavioral alerts
- Session tracking: Request patterns, risk progression
- Pattern evolution: Dynamic rule creation, promotion, and deprecation events
- Shadow mode detection: Threats detected but not blocked
Example log entries:
# Traditional rule block
[2024-03-21 10:30:45] 127.0.0.1 GET http://localhost:8080/test?cmd=shell_exec(ls) Mozilla/5.0 - BLOCKED: COMMAND_INJECTION_2 - Detects command injection function calls
# ML anomaly detection
[2024-03-21 10:31:12] 192.168.1.100 POST http://localhost:8080/api/data Chrome/91.0 - BLOCKED: ML_ANOMALY_HIGH_CONFIDENCE - ML detected anomaly (score: 0.89, risk: high)
# Shadow mode threat detection (not blocked)
[2025-09-15 21:16:29] ::1 GET http://localhost:8080/test?input=<script>alert('xss')</script> curl/8.7.1 - SHADOW_MODE_THREAT: XSS_1 - Detects XSS attack patterns
# Behavioral analysis alert
[2024-03-21 10:31:45] 10.0.0.50 GET http://localhost:8080/admin/config curl/7.68.0 - ML_ALERT: anomaly=0.72 risk=high
# Pattern evolution events
[2024-03-21 10:32:15] ML: Created dynamic rule dyn_a1b2c3d4 (confidence: 0.85, source: ngram)
[2024-03-21 10:35:30] ML: Promoted dynamic rule dyn_a1b2c3d4 to production (confidence: 0.92)
# Enhanced behavioral analysis alerts
[2024-03-21 10:33:00] 10.0.0.75 GET http://localhost:8080/scan curl/7.68.0 - ML_ALERT: anomaly=0.78 risk=high score=0.85 violations=3 patterns=1
[2024-03-21 10:33:15] 192.168.1.50 POST http://localhost:8080/upload Scanner/2.0 - BLOCKED: ML_BEHAVIORAL_VIOLATIONS - Multiple behavioral violations detected
[2024-03-21 10:33:30] ML: High-risk behavior detected for 10.0.0.75 (risk: 0.89, level: critical)
# Normal request with enhanced ML scoring
[2024-03-21 10:32:00] 192.168.1.101 GET http://localhost:8080/dashboard Firefox/89.0 - ALLOWED
Test which rules are compatible with Vectorscan:
go run test_rules_compatibility.goThis will show:
- Individual rule compatibility status
- Compilation success/failure for each pattern
- Overall compatibility percentage
- Test results against sample attack payloads
- SQL Injection: Multiple detection patterns including tautologies, comments, and time-based
- XSS: Script tags, event handlers, and JavaScript protocols
- Command Injection: Shell operators and function calls
- Path Traversal: Directory traversal and file access attempts
- File Inclusion: Both local (LFI) and remote (RFI)
- PHP Injection: eval(), assert(), and other dangerous functions
- Log4Shell: CVE-2021-44228 JNDI injection
- Null Byte: Various null byte encoding attempts
- Zero-Day Detection: Anomaly detection for unknown attack patterns
- Behavioral Analysis: Unusual request patterns and session profiling
- Adaptive Learning: Continuous improvement from legitimate traffic
- Advanced Evasion: Detection of encoded, obfuscated, or polymorphic attacks
- Statistical Analysis: Entropy-based detection of suspicious payloads
- Dynamic Rule Generation: Automatic creation of new rules from blocked attacks
- N-gram Analysis: Frequency-based pattern discovery (3-20 character sequences)
- Attack Clustering: Similarity-based grouping of attack patterns
- Regex Synthesis: Intelligent conversion of patterns to regex rules
- False Positive Learning: Continuous improvement through user feedback
- Rule Promotion: Automatic promotion of high-confidence patterns to production
- Comprehensive Session Profiling: 65+ behavioral metrics per session
- Progressive Risk Scoring: Multi-factor risk assessment with confidence levels
- Behavioral Violation Detection: Real-time rate, pattern, temporal, and content anomaly detection
- Session Characterization: Automated human/bot/scanner classification
- Advanced Statistical Analysis: Z-score analysis and deviation detection
- Threat Intelligence Integration: IOC matching and reputation scoring
- Multi-dimensional Pattern Recognition: Behavioral fingerprinting and sequence analysis
- IP-based whitelisting/blacklisting
- Rate limiting per IP (configurable window)
- Dangerous HTTP method blocking (TRACE, TRACK, DEBUG)
- Security scanner detection via User-Agent
- ML-Enhanced: Behavioral rate limiting based on session profiling
- URL parameters with entropy analysis
- Request headers with pattern matching
- Request body inspection (with configurable size limits)
- User agent filtering and behavioral analysis
- 24-Dimensional Feature Extraction: Statistical, entropy, behavioral, and temporal features
- Session Correlation: Cross-request pattern analysis
- 65+ Behavioral Metrics: Comprehensive session profiling and analysis
- Real-time Violation Tracking: Behavioral anomaly detection and alerting
- Risk-Based Blocking: Graduated response based on confidence levels
- False Positive Learning: User feedback integration for model improvement
- Adaptive Thresholds: Self-tuning based on traffic patterns
- Multi-Factor Scoring: Combined rule + ML + behavioral assessment
- Dynamic Pattern Discovery: Real-time identification of emerging attack trends
- Automated Rule Evolution: Self-improving security through pattern learning
- Progressive Decision Logic: Enhanced blocking with behavioral violation thresholds
- Session-Aware Blocking: Context-aware decisions based on comprehensive session analysis
- Advanced Risk Assessment: Multi-dimensional scoring with confidence levels
With 16 active rules + ML analysis on typical web requests:
| Processing Layer | Time per Request | Throughput | CPU Usage |
|---|---|---|---|
| Vectorscan Only | ~0.05ms | 20,000 req/s | 15% |
| Vectorscan + ML | ~0.5ms | 15,000 req/s | 25% |
| Sequential Regex | ~2.5ms | 400 req/s | 85% |
- Feature Extraction: ~0.2ms per request
- Anomaly Detection: ~0.3ms per request (100 trees)
- Basic Behavioral Analysis: ~0.1ms per request
- Enhanced Behavioral Analysis: ~0.1ms per request (65+ metrics)
- Pattern Evolution: ~0.1ms per request (background analysis)
- Progressive Decision Logic: ~0.05ms per request
- Model Training: ~50ms for 1000 samples (background)
- Memory Overhead: ~100MB for 10K sessions + ML models
- Startup memory: ~25MB (Vectorscan + ML models)
- Per-request overhead: <2KB (with ML features + object pooling)
- Pattern database: ~200KB (Vectorscan rules)
- ML models: ~10MB (isolation forest + behavioral data)
- Session data: ~100MB for 10K active sessions
- Pattern mining: ~50KB for 10K blocked requests + dynamic rules
- Enhanced behavioral profiles: ~200KB for 1K sessions with 65+ metrics
- WAF Engine: Main request handler and router
- Vectorscan Matcher: High-performance pattern matching engine
- ML Anomaly Detector: Custom Isolation Forest for anomaly detection
- Basic Session Profiler: Behavioral analysis and risk assessment
- Advanced Session Profiler: Enhanced 65+ metric behavioral analysis
- Feature Extractor: 24-dimensional security feature analysis
- Pattern Miner: Dynamic rule generation and pattern evolution
- Feedback & Auto-Tuning System: User feedback collection and automatic performance optimization
- Behavioral Risk Calculator: Multi-factor risk assessment engine
- Rule Manager: Rule compilation and management
- Rate Limiter: Token bucket implementation per IP
- Backend Proxy: Reverse proxy to backend services
- Statistics Collector: Real-time metrics collection (WAF + ML)
- Configuration Reloader: Hot-reload support
-
Request Processing:
- Request received → Rate limiting check
- IP whitelist/blacklist verification
- Request context extraction and pooling
-
Layer 1 - Traditional Security:
- Vectorscan pattern matching (all rules simultaneously)
- Fallback to regex if needed (rare)
- Immediate block for known attack patterns
-
Layer 2 - ML Analysis (if Layer 1 passes):
- Feature extraction (24 dimensions)
- Anomaly detection using Isolation Forest
- Behavioral analysis and session profiling
- Risk assessment (low → critical)
-
Layer 2c - Enhanced Behavioral Analysis:
- Comprehensive 65+ metric behavioral profiling
- Advanced risk scoring with multi-factor assessment
- Behavioral violation detection (rate, pattern, temporal, content)
- Session characterization and threat intelligence
-
Layer 3 - Progressive Decision Logic:
- Enhanced blocking criteria with behavioral violations
- Session-aware decision making
- Combined risk assessment (anomaly + behavioral + pattern)
- Advanced threshold management
-
Layer 4 - Pattern Evolution (background):
- Blocked request analysis for pattern mining
- N-gram frequency analysis and clustering
- Dynamic rule generation and validation
- Rule promotion and deprecation management
-
Response Generation:
- Backend proxy or error response
- Enhanced logging with comprehensive ML insights
- Statistics update (traditional + ML + behavioral metrics)
- Pattern mining and behavioral data collection
If you see "failed to compile Vectorscan patterns" in logs:
- Ensure Vectorscan library is properly installed
- Check
ldconfigincludes Vectorscan library path - Verify CPU supports SSSE3 instructions (required for Vectorscan)
# If you get "hs/hs.h not found"
export CGO_CFLAGS="-I/usr/local/include"
export CGO_LDFLAGS="-L/usr/local/lib -lhs"
go build- Multi-core systems: Increase
GOMAXPROCSfor better ML performance - Rate limiting: Adjust windows based on traffic patterns
- ML thresholds: Lower anomaly threshold (0.4-0.5) for more sensitive detection
- Session limits: Reduce
maxSessionsif memory constrained - Training frequency: Increase
retrainIntervalfor high-traffic sites - Pattern mining: Adjust
analysisIntervalbased on attack frequency - Load balancing: Consider multiple WAF instances with shared ML training data
Feel free to submit issues and enhancement requests!
When contributing:
- Traditional Rules: Ensure new rules are Vectorscan-compatible when possible
- Compatibility: Run
go run test_rules_compatibility.gofor pattern validation - ML Features: Test ML components with
./test_phase2_ml.sh - Pattern Mining: Test pattern evolution with
./test_phase3_patterns.sh - Enhanced Behavioral Analysis: Test advanced profiling with
./test_phase4_behavioral.sh - Feedback & Auto-Tuning: Test feedback loops with
./test_phase5_feedback.sh - Performance: Include benchmarks for performance-critical changes
- ML Models: Consider impact on training data and model accuracy
- Documentation: Update README and implementation docs for new features
- Maintain backward compatibility with existing rule-based detection
- Ensure ML enhancements don't impact sub-millisecond response times
- Test anomaly detection with various attack types and evasion techniques
- Validate behavioral analysis with legitimate user patterns
- Include false positive analysis for ML-based blocking decisions
- Test pattern mining with diverse attack patterns and clustering scenarios
- Validate dynamic rule generation accuracy and promotion criteria
# Build and test basic functionality
go build && ./waf &
# Test traditional rules and Vectorscan compatibility
go run test_rules_compatibility.go
# Test ML features and endpoints
./test_phase2_ml.sh
# Test pattern mining and dynamic rule generation
./test_phase3_patterns.sh
# Test enhanced behavioral analysis
./test_phase4_behavioral.sh
# Test feedback and auto-tuning
./test_phase5_feedback.sh
# Test comprehensive configuration management
./test_phases3_4_config.sh
./test_phase5_config.sh
# Test shadow mode functionality
./test_shadow_mode.sh
# Test ML persistence functionality
./test_persistence.sh
# Performance testing
curl http://localhost:8080/_waf/stats
curl http://localhost:8080/_waf/ml/anomaly_stats
curl http://localhost:8080/_waf/ml/dynamic_rules
curl http://localhost:8080/_waf/ml/pattern_stats
curl http://localhost:8080/_waf/ml/behavioral_stats
curl http://localhost:8080/_waf/ml/session_profiles
curl http://localhost:8080/_waf/ml/violations
# Phase 5: Feedback and auto-tuning testing
curl http://localhost:8080/_waf/ml/model_performance
curl http://localhost:8080/_waf/ml/tuning_history
curl http://localhost:8080/_waf/ml/rule_performance
curl http://localhost:8080/_waf/ml/auto_tuning_config
# Shadow mode testing
curl http://localhost:8080/_waf/shadow_mode
curl -X POST http://localhost:8080/_waf/shadow_mode -d '{"shadow_mode": true}'
curl -X POST http://localhost:8080/_waf/reloadThe WAF now features automatic ML state persistence:
- Zero Data Loss: All ML training and learning preserved across restarts
- Automatic Save/Load: ML state automatically saved on shutdown, loaded on startup
- Configurable Auto-Save: Periodic saving with customizable intervals
- Backup Management: Automatic backup creation with retention policies
- Production Ready: Instant ML capability on startup without retraining
The WAF features complete configuration externalization:
- Zero Hardcoded Values: All ML parameters moved to
config.json - Phase 3 Configuration: 13 pattern mining parameters fully configurable
- Phase 4 Configuration: 20+ behavioral analysis parameters configurable
- Phase 5 Configuration: 18 feedback & auto-tuning parameters configurable
- Persistence Configuration: 5 persistence management parameters
- Hot-Reload Support: All configurations can be updated without restart
- Environment-Specific: Different configs for production, development, testing
Essential for safe WAF deployment:
- API Control: Enable/disable shadow mode via REST endpoints
- Comprehensive Logging: Enhanced logs with
SHADOW_MODE_THREATdetection - Zero Impact: Full detection without blocking traffic
- Production Ready: Safe testing of rules and ML models in live environments
The WAF supports API key authentication for admin endpoints to secure management operations.
{
"api_key": "your-secret-api-key-change-this",
"admin_endpoints_require_auth": true
}Include the X-API-Key header with all admin requests:
# Access statistics
curl http://localhost:8080/_waf/stats \
-H "X-API-Key: your-secret-api-key-change-this"
# ML model retraining
curl -X POST http://localhost:8080/_waf/ml/model_retrain \
-H "X-API-Key: your-secret-api-key-change-this"
# Configuration reload
curl -X POST http://localhost:8080/_waf/reload \
-H "X-API-Key: your-secret-api-key-change-this"The WAF supports HTTPS for secure communications and encrypted admin access.
{
"enable_https": true,
"tls_cert_file": "server.crt",
"tls_key_file": "server.key"
}Create self-signed certificates for testing:
openssl req -x509 -newkey rsa:2048 -keyout server.key -out server.crt -sha256 -days 365 -nodes -subj '/CN=localhost'Access the WAF via HTTPS:
# Statistics (HTTPS + API key)
curl https://localhost:8080/_waf/stats \
-H "X-API-Key: your-secret-api-key-change-this" \
-k # Use -k for self-signed certificates
# Test protected endpoint
curl https://localhost:8080/test \
-k # Use -k for self-signed certificates- Change Default API Key: Always replace the default API key with a strong, unique value
- Use HTTPS in Production: Enable HTTPS for all production deployments
- Certificate Management: Use proper certificates from a trusted CA in production
- Key Rotation: Regularly rotate API keys and certificates
- Access Control: Restrict access to admin endpoints at the network level when possible
The WAF supports both JSON-based rules (legacy) and SQLite database-based rules (recommended) for enhanced management capabilities.
Enable the rules database in your config.json:
{
"rules_db_config": {
"enable_database": true,
"database_path": "./waf_rules.db",
"migrate_from_json": true,
"json_backup_on_migration": true
}
}When migrate_from_json is enabled, the WAF will automatically migrate rules from your existing rules.json to the database on first startup if the database is empty.
Use the migration utility for more control:
# Build the migration utility
go build -o migrate-rules ./cmd/migrate_rules.go
# Migrate with options
./migrate-rules -json rules.json -db waf_rules.db -backup
# Options:
# -json: Path to JSON rules file (default: rules.json)
# -db: Path to SQLite database (default: waf_rules.db)
# -backup: Create backup of JSON file (default: true)
# -overwrite: Overwrite existing database (default: false)The WAF provides a comprehensive REST API for rule management:
# List all rules with pagination
curl -H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8080/_waf/rules?limit=20&offset=0"
# Filter rules
curl -H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8080/_waf/rules?enabled=true&action=block"curl -H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8080/_waf/rules/SQL_INJECTION_1"curl -X POST -H "Content-Type: application/json" \
-H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8080/_waf/rules" \
-d '{
"rule_id": "CUSTOM_RULE_1",
"name": "Custom Attack Detection",
"pattern": "(?i)(malicious|evil)",
"action": "block",
"enabled": true,
"description": "Detects malicious patterns"
}'curl -X PUT -H "Content-Type: application/json" \
-H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8080/_waf/rules/CUSTOM_RULE_1" \
-d '{
"name": "Updated Custom Attack Detection",
"pattern": "(?i)(malicious|evil|harmful)",
"action": "block",
"enabled": true,
"description": "Updated to detect more attack patterns"
}'# Soft delete (disable rule)
curl -X DELETE -H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8080/_waf/rules/CUSTOM_RULE_1"
# Hard delete (permanently remove)
curl -X DELETE -H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8080/_waf/rules/CUSTOM_RULE_1?hard=true"# View rule change history
curl -H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8080/_waf/rules/SQL_INJECTION_1/history?limit=10"# Get rules database statistics
curl -H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8080/_waf/rules/stats"# Export rules to JSON
curl -H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8080/_waf/rules/export" > exported_rules.json
# Import rules from JSON
curl -X POST -H "Content-Type: application/json" \
-H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8080/_waf/rules/import" \
-d @exported_rules.json# Reload rules from database into memory
curl -X POST -H "X-API-Key: your-secret-api-key-change-this" \
"http://localhost:8080/_waf/rules/reload"- Every rule change creates a new version
- Complete audit trail of all modifications
- History includes CREATE, UPDATE, DELETE operations
- Timestamps and operation tracking
CREATE TABLE rule_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
rule_id TEXT NOT NULL,
name TEXT NOT NULL,
pattern TEXT NOT NULL,
action TEXT NOT NULL,
enabled BOOLEAN NOT NULL,
description TEXT DEFAULT '',
version INTEGER NOT NULL,
operation TEXT NOT NULL, -- 'CREATE', 'UPDATE', 'DELETE'
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);- Automatic timestamp updates
- Version incrementing on changes
- History logging for all operations
- Data integrity enforcement
- Dynamic Updates: No WAF restart required for rule changes
- Concurrent Access: Multiple administrators can manage rules safely
- Query Performance: Fast filtering and searching with SQL indexes
- Data Integrity: ACID transactions ensure consistency
- Scalability: Handle thousands of rules efficiently
- Audit Trail: Complete change history for compliance
- Rules changes take effect immediately
- No service interruption
- Automatic pattern recompilation
- Memory-efficient rule loading
- Use Descriptive Names: Clear rule names improve maintainability
- Version Control Integration: Export rules regularly for backup
- Test Before Production: Use shadow mode to validate new rules
- Monitor Performance: Check rule match statistics regularly
- Regular Cleanup: Remove obsolete or ineffective rules
- Regular Backups: Export rules database periodically
- History Cleanup: Archive old history records if needed
- Index Optimization: Monitor database performance
- Storage Planning: Plan for rule history growth
This WAF is production-ready with enterprise-grade configuration management, ML persistence, secure administration, dynamic rules database, and safe deployment capabilities.