Koda-WAF v1.0

Koda-WAF is a high-performance machine learning Web Application Firewall (WAF) model. It is designed to classify HTTP requests as Benign (0) or Malicious (1) with a specific focus on reducing false positives in modern, complex web traffic (JSON, long User Agents, and nested query parameters).

Model Description

Koda-WAF uses XGBoost (Gradient Boosted Decision Trees) to analyze the "intent" of a request rather than just its "structure." Unlike traditional regex-based WAFs that struggle with complex strings, Koda-WAF uses 18+ engineered features to identify patterns associated with:

  • SQL Injection (SQLi)
  • Cross-Site Scripting (XSS)
  • Path Traversal (LFI/RFI)

Key Features

  • Anti-Overfitting Logic: Uses L2 Regularization and shallow trees to prevent the "Long String = Bad" bias.
  • UA-Agnostic: Designed to ignore the length of User-Agent strings, preventing blocks on modern browsers.
  • JSON-Aware: Trained on high-entropy benign JSON payloads to ensure API traffic isn't accidentally blocked.

Intended Use

  • Deployment: Best used as a secondary filter in an Nginx/OpenResty Lua module or a FastAPI middleware.
  • Recommended Threshold: - Block Mode: 0.90 (High confidence required)
    • Log/Alert Mode: 0.75 (Early warning)

Training Data

The model was trained on a balanced mixture of:

  1. SynthWAF: Synthetic attack patterns.
  2. AI-WAF-Dataset: Real-world malicious logs.
  3. Manual Noise Injection: 15,000+ custom samples of "safe" technical searches and complex URLs to ensure generalization.

How to Use

Koda-WAF requires specific feature extraction before inference. You must use the matching extract_smart_features logic (including math.log1p scaling and length capping) to get accurate predictions.

import joblib
import pandas as pd

# Load model and feature metadata
model = joblib.load("smart_waf_model.pkl")
cols = joblib.load("model_features.pkl")

# Process your request dictionary through the feature extractor
# (Ensure your extractor matches the training logic)
features = extract_smart_features(request_data)
df = pd.DataFrame([features]).reindex(columns=cols, fill_value=0)

# Prediction
probability = model.predict_proba(df)[0][1]
if probability > 0.90:
    print("🔥 Attack Detected!")

Performance

Test Case Threat Probability Decision
Standard SQLi 99.9% Block
Path Traversal 90.2% Block
Chrome User Agent 89.6% Allow
Complex Safe URL 0.08% Allow

Limitations

Koda-WAF is a stateless model. It cannot detect multi-step attacks (brute force) or volumetric DDoS. It should be used as part of a Defense in Depth strategy.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train netgoat-ai/koda-waf

Space using netgoat-ai/koda-waf 1

Collection including netgoat-ai/koda-waf

Evaluation results