SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs Paper • 2504.08192 • Published Apr 11 • 4
Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs Paper • 2505.20254 • Published May 26 • 5
Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs Paper • 2505.20254 • Published May 26 • 5
Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs Paper • 2505.20254 • Published May 26 • 5 • 1
Running 3.05k 3.05k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
CoRAG: Collaborative Retrieval-Augmented Generation Paper • 2504.01883 • Published Apr 2 • 10 • 2
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs Paper • 2504.08192 • Published Apr 11 • 4
SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs Paper • 2504.08192 • Published Apr 11 • 4 • 2