Jing's picture

1 1 1

Jing

hij

·

AI & ML interests

None yet

Recent Activity

authored a paper 3 days ago

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

authored a paper 3 days ago

LLMs Encode Harmfulness and Refusal Separately

authored a paper 3 days ago

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors

View all activity

Organizations

hij 's models

None public yet