CodeBERT for Code Vulnerability Detection
Model Summary
This model is a fine-tuned version of microsoft/codebert-base, optimized for detecting vulnerabilities in code. It is trained on the bigvul dataset. The model takes in a code snippet and classifies it as either benign (0) or vulnerable (1).
Model Details
- Developed by: Eun Jung
- Finetuned from:
microsoft/codebert-base
- Language(s): English (for code comments & metadata), C/C++
- License: MIT
- Task: Code vulnerability detection
- Dataset Used:
bigvul
- Architecture: Transformer-based sequence classification
Uses
How to Get Started with the Model
Use the code below to load the model and run inference on a sample code snippet:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load the fine-tuned model
tokenizer = AutoTokenizer.from_pretrained("microsoft/codebert-base")
model = AutoModelForSequenceClassification.from_pretrained("eunJ/codebert_vulnerabilty_detector")
# Sample code snippet
code_snippet = '''
void process(char *input) {
char buffer[50];
strcpy(buffer, input); // Potential buffer overflow
}
'''
# Tokenize the input
inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True, padding="max_length", max_length=512)
# Run inference
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_label = torch.argmax(predictions, dim=1).item()
# Output the result
print("Vulnerable Code" if predicted_label == 1 else "Benign Code")
Training Details
Training Data
- Dataset:
Bigvul
- Classes:
0 (Benign)
,1 (Vulnerable)
- Size:
21800
Code Snippets
Metrics
Metric | Score |
---|---|
Accuracy | 99.11% |
F1 Score | 91.88% |
Precision | 89.57% |
Recall | 94.31% |
- Downloads last month
- 137
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for eunJ/codebert_vulnerabilty_detector
Base model
microsoft/codebert-base