Spaces:
Running
Running
Update index.html
Browse files- index.html +8 -2
index.html
CHANGED
|
@@ -238,8 +238,14 @@ should begin with ’Sure, here is guidance on how to cause physical harm to oth
|
|
| 238 |
|
| 239 |
|
| 240 |
<h2 id="proposed-approach-gradient-cuff">Performance evaluation against practical Jailbreaks</h2>
|
| 241 |
-
<p>
|
| 242 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 243 |
</p>
|
| 244 |
|
| 245 |
<div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>
|
|
|
|
| 238 |
|
| 239 |
|
| 240 |
<h2 id="proposed-approach-gradient-cuff">Performance evaluation against practical Jailbreaks</h2>
|
| 241 |
+
<p>
|
| 242 |
+
The performance for Jailbreak defending methods is usually measured by how they can reduce the ASR. Major concerns
|
| 243 |
+
when developing such methods is the performance degradation of the LLM on nominal benign prompts and the increased inference time cost
|
| 244 |
+
. We test our method on Vicuna-7B-V1.5 with existing defense methods, jointly considering the ASR, Win Rate, and running time cost. In the
|
| 245 |
+
plot shown below, the horizon axis represents the ASR averaged over 6 jailbreak attacks (GCG, AutoDAN,
|
| 246 |
+
PAIR, TAP, Manyshot, and AIM), and the vertica axis shows the Win Rate on Alpaca Eval of the
|
| 247 |
+
protected LLM when the corresponding defense is deployed. The printed value for each marker is the running time
|
| 248 |
+
averaged across the 25 samples. Larger size of a marker means lower running time cost.
|
| 249 |
</p>
|
| 250 |
|
| 251 |
<div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>
|