Kyle O'Brien PRO
Kyle1668
AI & ML interests
Interpretability, model editing, alignment
Recent Activity
authored
a paper
1 day ago
Composable Interventions for Language Models
authored
a paper
1 day ago
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted
Phenomenon
authored
a paper
1 day ago
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant
Safeguards into Open-Weight LLMs