Koda-WAF v1.0
Koda-WAF is a high-performance machine learning Web Application Firewall (WAF) model. It is designed to classify HTTP requests as Benign (0) or Malicious (1) with a specific focus on reducing false positives in modern, complex web traffic (JSON, long User Agents, and nested query parameters).
Model Description
Koda-WAF uses XGBoost (Gradient Boosted Decision Trees) to analyze the "intent" of a request rather than just its "structure." Unlike traditional regex-based WAFs that struggle with complex strings, Koda-WAF uses 18+ engineered features to identify patterns associated with:
- SQL Injection (SQLi)
- Cross-Site Scripting (XSS)
- Path Traversal (LFI/RFI)
Key Features
- Anti-Overfitting Logic: Uses L2 Regularization and shallow trees to prevent the "Long String = Bad" bias.
- UA-Agnostic: Designed to ignore the length of User-Agent strings, preventing blocks on modern browsers.
- JSON-Aware: Trained on high-entropy benign JSON payloads to ensure API traffic isn't accidentally blocked.
Intended Use
- Deployment: Best used as a secondary filter in an Nginx/OpenResty Lua module or a FastAPI middleware.
- Recommended Threshold: - Block Mode:
0.90(High confidence required)- Log/Alert Mode:
0.75(Early warning)
- Log/Alert Mode:
Training Data
The model was trained on a balanced mixture of:
- SynthWAF: Synthetic attack patterns.
- AI-WAF-Dataset: Real-world malicious logs.
- Manual Noise Injection: 15,000+ custom samples of "safe" technical searches and complex URLs to ensure generalization.
How to Use
Koda-WAF requires specific feature extraction before inference. You must use the matching extract_smart_features logic (including math.log1p scaling and length capping) to get accurate predictions.
import joblib
import pandas as pd
# Load model and feature metadata
model = joblib.load("smart_waf_model.pkl")
cols = joblib.load("model_features.pkl")
# Process your request dictionary through the feature extractor
# (Ensure your extractor matches the training logic)
features = extract_smart_features(request_data)
df = pd.DataFrame([features]).reindex(columns=cols, fill_value=0)
# Prediction
probability = model.predict_proba(df)[0][1]
if probability > 0.90:
print("🔥 Attack Detected!")
Performance
| Test Case | Threat Probability | Decision |
|---|---|---|
| Standard SQLi | 99.9% | Block |
| Path Traversal | 90.2% | Block |
| Chrome User Agent | 89.6% | Allow |
| Complex Safe URL | 0.08% | Allow |
Limitations
Koda-WAF is a stateless model. It cannot detect multi-step attacks (brute force) or volumetric DDoS. It should be used as part of a Defense in Depth strategy.
Datasets used to train netgoat-ai/koda-waf
Space using netgoat-ai/koda-waf 1
Collection including netgoat-ai/koda-waf
Evaluation results
- f1self-reported0.850