Fine-tuned Model Collections using the "Representation Bending" (REPBEND) approach described in Representation Bending for Large Language Model Safety
AI & ML interests
AI Safety & AI Security
Recent Activity
View all activity
Papers
X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates
ObjexMT: Objective Extraction and Metacognitive Calibration for LLM-as-a-Judge under Multi-Turn Jailbreaks