π Native Log Translator
Maps heterogeneous cloud and OS logs to a unified normalized schema.
Fine-tuned from microsoft/codebert-base using LoRA (PEFT) on a curated dataset
of multi-provider security logs. Trained on Kaggle T4 x2 in FP16.
π Quick Start
import torch
from transformers import RobertaTokenizer, RobertaForCausalLM, RobertaConfig
from peft import PeftModel
MODEL_REPO = "Swapnanil09/native-log-translator-qlora"
BASE_MODEL = "microsoft/codebert-base"
tokenizer = RobertaTokenizer.from_pretrained(MODEL_REPO)
config = RobertaConfig.from_pretrained(BASE_MODEL)
config.is_decoder = True
base = RobertaForCausalLM.from_pretrained(
BASE_MODEL,
config=config,
ignore_mismatched_sizes=True,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, MODEL_REPO)
model.eval()
def translate_log(log_input):
prompt = f"<log>{log_input}</log>\n<schema>"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=60, temperature=0.1,
do_sample=True, pad_token_id=tokenizer.eos_token_id)
decoded = tokenizer.decode(out[0], skip_special_tokens=True)
return decoded.split("<schema>")[-1].strip()
print(translate_log("AzureSignInLogs | ResultType=0"))
# event_type: authentication_success
# provider: azure
# risk_level: low
π Output Schema
| Field | Description | Values |
|---|---|---|
event_type |
Normalized event category | e.g. authentication_success, privilege_escalation |
provider |
Source cloud / OS | azure, aws, gcp, windows, linux, paloalto, cisco, fortinet |
risk_level |
Severity classification | low, medium, high, critical |
π¦ Supported Log Sources
| Provider | Log Type |
|---|---|
| Azure | SignInLogs, Activity, NSGFlowLogs |
| AWS | CloudTrail |
| GCP | Audit Logs |
| Windows | Security Events (4624, 4625, 4688, 4698, 4720, 4732, 1102 ...) |
| Linux | Syslog (auth, kern) |
| Network | Palo Alto, Cisco, Fortinet (CommonSecurityLog) |
βοΈ Training Details
| Setting | Value |
|---|---|
| Base model | microsoft/codebert-base |
| Method | LoRA (PEFT) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | query, key, value |
| Epochs | 15 |
| Batch size | 8 per device |
| Gradient accumulation | 4 steps |
| Learning rate | 2e-4 |
| Precision | FP16 |
| Hardware | Kaggle T4 x2 |
π Intended Use
- SIEM normalization pipelines
- Multi-cloud SOC log ingestion
- Security event correlation
- Threat detection preprocessing
β οΈ Limitations
- Trained on a small curated dataset β production use should involve fine-tuning on your own log corpus
- May not generalize to vendor-specific log formats not seen during training
- Not a replacement for rule-based parsers in high-stakes pipelines without validation
- Downloads last month
- 19
Model tree for Swapnanil09/native-log-translator-qlora
Base model
microsoft/codebert-base