Log Anomaly Detection Models

This repository contains trained models for the Log Anomaly Detection System that classifies system logs into 7 anomaly categories.

πŸ€– Available Models

BERT-based Models

  • DANN-BERT (models/DANN-BERT-Log-Anomaly-Detection/) - Domain-Adversarial Neural Network
  • LoRA-BERT (models/LoRA-BERT-Log-Anomaly-Detection/) - Low-Rank Adaptation
  • Hybrid-BERT (models/Hybrid-BERT-Log-Anomaly-Detection/) - BERT + Template Features

Traditional ML Models

  • XGBoost (models/XGBoost-Log-Anomaly-Detection/) - Gradient Boosting Classifier

πŸ“Š Model Performance

Model F1-Score (Macro) Accuracy Parameters
Hybrid-BERT 92.8% 94.3% 110M
DANN-BERT 90.3% 92.1% 110M
LoRA-BERT 88.7% 90.5% 1.5M (trainable)
XGBoost 88.5% 91.2% -

🎯 Classification Categories

  1. Normal (0): Benign operations
  2. Security Anomaly (1): Authentication failures, unauthorized access
  3. System Failure (2): Crashes, kernel panics
  4. Performance Issue (3): Timeouts, slow responses
  5. Network Anomaly (4): Connection errors, packet loss
  6. Config Error (5): Misconfigurations, invalid settings
  7. Hardware Issue (6): Disk failures, memory errors

πŸš€ Usage

Download Models

from huggingface_hub import hf_hub_download

# Download BERT model
model_path = hf_hub_download(
    repo_id="krishnas4415/log-anomaly-detection-models",
    filename="models/Hybrid-BERT-Log-Anomaly-Detection/pytorch_model.pt"
)

# Download XGBoost model
xgb_path = hf_hub_download(
    repo_id="krishnas4415/log-anomaly-detection-models", 
    filename="models/XGBoost-Log-Anomaly-Detection/best_mod.pkl"
)

Load and Use Models

import torch
import pickle
from transformers import AutoTokenizer

# Load BERT model
model = torch.load(model_path)
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Load XGBoost model
with open(xgb_path, 'rb') as f:
    xgb_model = pickle.load(f)

# Example prediction
log_text = "Apr 15 12:34:56 server sshd[1234]: Failed password for admin"
inputs = tokenizer(log_text, return_tensors='pt', max_length=128, truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1)

πŸ“š Training Data

  • Sources: 16 log types (Apache, SSH, Hadoop, HDFS, Linux, Windows, etc.)
  • Size: ~32,000 labeled logs
  • Classes: 7 anomaly categories
  • Features: BERT embeddings + template features + statistical features

πŸ”— Related Links

πŸ“„ Citation

@misc{log-anomaly-detection-2024,
  title={Log Anomaly Detection System},
  author={Krishna Sharma},
  year={2024},
  url={https://github.com/krishnasharma4415/log-anomaly-detection}
}

πŸ“ License

MIT License - see LICENSE file for details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support