Koda-WAF v1.0

Koda-WAF is a high-performance machine learning Web Application Firewall (WAF) model. It is designed to classify HTTP requests as Benign (0) or Malicious (1) with a specific focus on reducing false positives in modern, complex web traffic (JSON, long User Agents, and nested query parameters).

Model Description

Koda-WAF uses XGBoost (Gradient Boosted Decision Trees) to analyze the "intent" of a request rather than just its "structure." Unlike traditional regex-based WAFs that struggle with complex strings, Koda-WAF uses 18+ engineered features to identify patterns associated with:

SQL Injection (SQLi)
Cross-Site Scripting (XSS)
Path Traversal (LFI/RFI)

Key Features

Anti-Overfitting Logic: Uses L2 Regularization and shallow trees to prevent the "Long String = Bad" bias.
UA-Agnostic: Designed to ignore the length of User-Agent strings, preventing blocks on modern browsers.
JSON-Aware: Trained on high-entropy benign JSON payloads to ensure API traffic isn't accidentally blocked.

Intended Use

Deployment: Best used as a secondary filter in an Nginx/OpenResty Lua module or a FastAPI middleware.
Recommended Threshold: - Block Mode: 0.90 (High confidence required)
- Log/Alert Mode: 0.75 (Early warning)

Training Data

The model was trained on a balanced mixture of:

SynthWAF: Synthetic attack patterns.
AI-WAF-Dataset: Real-world malicious logs.
Manual Noise Injection: 15,000+ custom samples of "safe" technical searches and complex URLs to ensure generalization.

How to Use

Koda-WAF requires specific feature extraction before inference. You must use the matching extract_smart_features logic (including math.log1p scaling and length capping) to get accurate predictions.

import joblib
import pandas as pd

# Load model and feature metadata
model = joblib.load("smart_waf_model.pkl")
cols = joblib.load("model_features.pkl")

# Process your request dictionary through the feature extractor
# (Ensure your extractor matches the training logic)
features = extract_smart_features(request_data)
df = pd.DataFrame([features]).reindex(columns=cols, fill_value=0)

# Prediction
probability = model.predict_proba(df)[0][1]
if probability > 0.90:
    print("🔥 Attack Detected!")

Performance

Test Case	Threat Probability	Decision
Standard SQLi	99.9%	Block
Path Traversal	90.2%	Block
Chrome User Agent	89.6%	Allow
Complex Safe URL	0.08%	Allow

Limitations

Koda-WAF is a stateless model. It cannot detect multi-step attacks (brute force) or volumetric DDoS. It should be used as part of a Defense in Depth strategy.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train netgoat-ai/koda-waf

Space using netgoat-ai/koda-waf 1

Collection including netgoat-ai/koda-waf

Koda WAF

Collection

The base Koda models for Web Application Firewall • 1 item • Updated 12 days ago

Evaluation results

f1
self-reported

0.850