Principled evaluation of mechanistic interpretability methods.
Leaderboard for the Mechanistic Interpretability Benchmark