Log Anomaly Detection Models

This repository contains trained models for the Log Anomaly Detection System that classifies system logs into 7 anomaly categories.

🤖 Available Models

BERT-based Models

DANN-BERT (models/DANN-BERT-Log-Anomaly-Detection/) - Domain-Adversarial Neural Network
LoRA-BERT (models/LoRA-BERT-Log-Anomaly-Detection/) - Low-Rank Adaptation
Hybrid-BERT (models/Hybrid-BERT-Log-Anomaly-Detection/) - BERT + Template Features

Traditional ML Models

XGBoost (models/XGBoost-Log-Anomaly-Detection/) - Gradient Boosting Classifier

📊 Model Performance

Model	F1-Score (Macro)	Accuracy	Parameters
Hybrid-BERT	92.8%	94.3%	110M
DANN-BERT	90.3%	92.1%	110M
LoRA-BERT	88.7%	90.5%	1.5M (trainable)
XGBoost	88.5%	91.2%	-

🎯 Classification Categories

Normal (0): Benign operations
Security Anomaly (1): Authentication failures, unauthorized access
System Failure (2): Crashes, kernel panics
Performance Issue (3): Timeouts, slow responses
Network Anomaly (4): Connection errors, packet loss
Config Error (5): Misconfigurations, invalid settings
Hardware Issue (6): Disk failures, memory errors

🚀 Usage

Download Models

from huggingface_hub import hf_hub_download

# Download BERT model
model_path = hf_hub_download(
    repo_id="krishnas4415/log-anomaly-detection-models",
    filename="models/Hybrid-BERT-Log-Anomaly-Detection/pytorch_model.pt"
)

# Download XGBoost model
xgb_path = hf_hub_download(
    repo_id="krishnas4415/log-anomaly-detection-models", 
    filename="models/XGBoost-Log-Anomaly-Detection/best_mod.pkl"
)

Load and Use Models

import torch
import pickle
from transformers import AutoTokenizer

# Load BERT model
model = torch.load(model_path)
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Load XGBoost model
with open(xgb_path, 'rb') as f:
    xgb_model = pickle.load(f)

# Example prediction
log_text = "Apr 15 12:34:56 server sshd[1234]: Failed password for admin"
inputs = tokenizer(log_text, return_tensors='pt', max_length=128, truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1)

📚 Training Data

Sources: 16 log types (Apache, SSH, Hadoop, HDFS, Linux, Windows, etc.)
Size: ~32,000 labeled logs
Classes: 7 anomaly categories
Features: BERT embeddings + template features + statistical features

🔗 Related Links

Main Project: Log Anomaly Detection System
Live Demo: Frontend Application
API: Backend API

📄 Citation

@misc{log-anomaly-detection-2024,
  title={Log Anomaly Detection System},
  author={Krishna Sharma},
  year={2024},
  url={https://github.com/krishnasharma4415/log-anomaly-detection}
}

📝 License

MIT License - see LICENSE file for details.

Downloads last month: -; Downloads are not tracked for this model. How to track