dcarpintero commited on
Commit
d139d7b
·
verified ·
1 Parent(s): 697a491

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -14
README.md CHANGED
@@ -12,28 +12,30 @@ model-index:
12
  results: []
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
 
18
- # pangolin-guard-base
19
 
20
- This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
21
- It achieves the following results on the evaluation set:
22
- - Loss: 0.0196
23
- - F1: 0.9923
24
- - Accuracy: 0.9950
25
 
26
- ## Model description
27
 
28
- More information needed
29
 
30
- ## Intended uses & limitations
 
 
31
 
32
- More information needed
33
 
34
- ## Training and evaluation data
35
 
36
- More information needed
 
 
 
 
 
37
 
38
  ## Training procedure
39
 
 
12
  results: []
13
  ---
14
 
15
+ # PangolinGuard-Base
 
16
 
17
+ LLM applications face critical security challenges in form of prompt injections and jailbreaks. This can result in models leaking sensitive data or deviating from their intended behavior. Existing safeguard models are not fully open and have limited context windows (e.g., only 512 tokens in LlamaGuard).
18
 
19
+ PangolinGuard is a ModernBERT (Base), lightweight model that discriminates malicious prompts.
 
 
 
 
20
 
21
+ 🤗 [Tech-Blog](https://huggingface.co/blog/dcarpintero/pangolin-fine-tuning-modern-bert) | [GitHub Repo](https://github.com/dcarpintero/pangolin-guard)
22
 
23
+ ## Intended uses
24
 
25
+ - Adding custom, self-hosted safety checks to AI agents and conversational interfaces
26
+ - Topic and content moderation
27
+ - Mitigating risks when connecting AI pipelines to external services
28
 
29
+ ## Evaluation data
30
 
31
+ Evaluated on unseen data from a subset of specialized benchmarks targeting prompt safety and malicious input detection, while testing over-defense behavior:
32
 
33
+ - NotInject: Designed to measure over-defense in prompt guard models by including benign inputs enriched with trigger words common in prompt injection attacks.
34
+ - BIPIA: Evaluates privacy invasion attempts and boundary-pushing queries through indirect prompt injection attacks.
35
+ - Wildguard-Benign: Represents legitimate but potentially ambiguous prompts.
36
+ - PINT: Evaluates particularly nuanced prompt injection, jailbreaks, and benign prompts that could be misidentified as malicious.
37
+
38
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64a13b68b14ab77f9e3eb061/ygIo-Yo3NN7mDhZlLFvZb.png)
39
 
40
  ## Training procedure
41