PEFT
Safetensors

INSAIT-Institute/Zephyr-7B-MixAT

INSAIT logo

This is a model adapter for HuggingFaceH4/zephyr-7b-beta, fine-tuned using the MixAT method. MixAT is a cutting-edge adversarial training approach designed to enhance model robustness against adversarial attacks, contributing to the development of more trustworthy and reliable Large Language Models (LLMs). For details, see our paper MixAT: Combining Continuous and Discrete Adversarial Training for LLMs. Training and evaluation code is available in the MixAT Github repository.

Use in πŸ€— PEFT and Transformers (Quantized)

First, install the required libraries:

pip install transformers peft bitsandbytes

Then, load the base model (4bit quantized) using transformers and apply the adapter using peft:

from peft import PeftModel
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="bfloat16"
)

base_model = AutoModelForCausalLM.from_pretrained(
    "HuggingFaceH4/zephyr-7b-beta",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    quantization_config=bnb_config
)

model = PeftModel.from_pretrained(base_model, "INSAIT-Institute/Zephyr-7B-MixAT")

Results

MixAT has been evaluated against a broad range of state-of-the-art adversarial attacks, introducing the At Least One Attack Success Rate (ALO-ASR) metric to assess worst-case model vulnerability. Our results show that MixAT achieves significantly improved robustness (ALO-ASR < 20%) compared to prior defenses (ALO-ASR > 50%), while maintaining good utility scores and a runtime comparable to continuous relaxation-based methods.

MixAT results

Model Sources

Summary

Citation

@article{dekany2025mixat,
  title={MixAT: Combining Continuous and Discrete Adversarial Training for LLMs},
  author={D{\'e}k{\'a}ny, Csaba and Balauca, Stefan and Staab, Robin and Dimitrov, Dimitar I and Vechev, Martin},
  journal={arXiv preprint arXiv:2505.16947},
  year={2025}
}
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for INSAIT-Institute/Zephyr-7B-MixAT

Adapter
(420)
this model

Collection including INSAIT-Institute/Zephyr-7B-MixAT