Log Anomaly Detection Models
This repository contains trained models for the Log Anomaly Detection System that classifies system logs into 7 anomaly categories.
π€ Available Models
BERT-based Models
- DANN-BERT (
models/DANN-BERT-Log-Anomaly-Detection/
) - Domain-Adversarial Neural Network - LoRA-BERT (
models/LoRA-BERT-Log-Anomaly-Detection/
) - Low-Rank Adaptation - Hybrid-BERT (
models/Hybrid-BERT-Log-Anomaly-Detection/
) - BERT + Template Features
Traditional ML Models
- XGBoost (
models/XGBoost-Log-Anomaly-Detection/
) - Gradient Boosting Classifier
π Model Performance
Model | F1-Score (Macro) | Accuracy | Parameters |
---|---|---|---|
Hybrid-BERT | 92.8% | 94.3% | 110M |
DANN-BERT | 90.3% | 92.1% | 110M |
LoRA-BERT | 88.7% | 90.5% | 1.5M (trainable) |
XGBoost | 88.5% | 91.2% | - |
π― Classification Categories
- Normal (0): Benign operations
- Security Anomaly (1): Authentication failures, unauthorized access
- System Failure (2): Crashes, kernel panics
- Performance Issue (3): Timeouts, slow responses
- Network Anomaly (4): Connection errors, packet loss
- Config Error (5): Misconfigurations, invalid settings
- Hardware Issue (6): Disk failures, memory errors
π Usage
Download Models
from huggingface_hub import hf_hub_download
# Download BERT model
model_path = hf_hub_download(
repo_id="krishnas4415/log-anomaly-detection-models",
filename="models/Hybrid-BERT-Log-Anomaly-Detection/pytorch_model.pt"
)
# Download XGBoost model
xgb_path = hf_hub_download(
repo_id="krishnas4415/log-anomaly-detection-models",
filename="models/XGBoost-Log-Anomaly-Detection/best_mod.pkl"
)
Load and Use Models
import torch
import pickle
from transformers import AutoTokenizer
# Load BERT model
model = torch.load(model_path)
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
# Load XGBoost model
with open(xgb_path, 'rb') as f:
xgb_model = pickle.load(f)
# Example prediction
log_text = "Apr 15 12:34:56 server sshd[1234]: Failed password for admin"
inputs = tokenizer(log_text, return_tensors='pt', max_length=128, truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1)
π Training Data
- Sources: 16 log types (Apache, SSH, Hadoop, HDFS, Linux, Windows, etc.)
- Size: ~32,000 labeled logs
- Classes: 7 anomaly categories
- Features: BERT embeddings + template features + statistical features
π Related Links
- Main Project: Log Anomaly Detection System
- Live Demo: Frontend Application
- API: Backend API
π Citation
@misc{log-anomaly-detection-2024,
title={Log Anomaly Detection System},
author={Krishna Sharma},
year={2024},
url={https://github.com/krishnasharma4415/log-anomaly-detection}
}
π License
MIT License - see LICENSE file for details.