Jing
hij
ยท
AI & ML interests
None yet
Recent Activity
authored
a paper
3 days ago
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse
Autoencoders
authored
a paper
3 days ago
LLMs Encode Harmfulness and Refusal Separately
authored
a paper
3 days ago
Internal Causal Mechanisms Robustly Predict Language Model
Out-of-Distribution Behaviors