| --- |
| license: apache-2.0 |
| tags: |
| - injection |
| - security |
| - llm |
| - prompt-injection |
| --- |
| |
| # Model Card for Vijil Prompt Injection |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| This model is a fine-tuned version of ModernBert to classify prompt-injection prompts which can manipulate language models into producing unintended outputs. |
|
|
| - **Developed by:** Vijil AI |
| - **License:** apache-2.0 |
| - **Finetuned version of [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert)** |
|
|
| ## Uses |
|
|
| Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses. |
| The vijil/mbert-prompt-injection model is designed to enhance security in language model applications by detecting prompt-injection attacks. |
|
|
| ## How to Get Started with the Model |
|
|
| ``` |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
| import torch |
| |
| tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base") |
| model = AutoModelForSequenceClassification.from_pretrained("vijil/mbert-prompt-injection") |
| |
| classifier = pipeline( |
| "text-classification", |
| model=model, |
| tokenizer=tokenizer, |
| truncation=True, |
| max_length=512, |
| device=torch.device("cuda" if torch.cuda.is_available() else "cpu"), |
| ) |
| |
| print(classifier("this is a prompt-injection prompt")) |
| |
| ``` |
|
|
| ## Training Details |
|
|
| ### Training Data |
|
|
| The dataset used for training the model was taken from |
|
|
| [wildguardmix/train](https://huggingface.co/datasets/allenai/wildguardmix) |
| and |
| [safe-guard-prompt-injection/train](https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection) |
|
|
| ### Training Procedure |
|
|
| Supervised finetuning with above dataset |
|
|
| #### Training Hyperparameters |
|
|
| * learning_rate: 5e-05 |
| |
| * train_batch_size: 32 |
| |
| * eval_batch_size: 32 |
| |
| * optimizer: adamw_torch_fused |
| |
| * lr_scheduler_type: cosine_with_restarts |
| |
| * warmup_ratio: 0.1 |
|
|
| * num_epochs: 3 |
| |
| ## Evaluation |
| |
| * Training Loss: 0.0036 |
| |
| * Validation Loss: 0.209392 |
| |
| * Accuracy: 0.961538 |
| |
| * Precision: 0.958362 |
| |
| * Recall: 0.957055 |
| |
| * Fl: 0.957708 |
| |
| #### Testing Data |
| |
| The dataset used for training the model was taken from |
| |
| [wildguardmix/test](https://huggingface.co/datasets/allenai/wildguardmix) |
| and |
| [safe-guard-prompt-injection/test](https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection) |
| |
| ### Results |
| |
| |
| |
| ## Model Card Contact |
| https://vijil.ai |