From Data to Behavior: Predicting Unintended Model Behaviors Before Training Paper • 2602.04735 • Published about 16 hours ago • 10
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics Paper • 2602.02343 • Published 3 days ago • 12
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics Paper • 2602.02343 • Published 3 days ago • 12
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics Paper • 2602.02343 • Published 3 days ago • 12
Aligning Agentic World Models via Knowledgeable Experience Learning Paper • 2601.13247 • Published 17 days ago • 15
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency Paper • 2601.05905 • Published 27 days ago • 18
How Do Large Language Models Learn Concepts During Continual Pre-Training? Paper • 2601.03570 • Published 29 days ago • 4
Aligning Agentic World Models via Knowledgeable Experience Learning Paper • 2601.13247 • Published 17 days ago • 15
Aligning Agentic World Models via Knowledgeable Experience Learning Paper • 2601.13247 • Published 17 days ago • 15
Can We Predict Before Executing Machine Learning Agents? Paper • 2601.05930 • Published 27 days ago • 26
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency Paper • 2601.05905 • Published 27 days ago • 18
Can We Predict Before Executing Machine Learning Agents? Paper • 2601.05930 • Published 27 days ago • 26
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency Paper • 2601.05905 • Published 27 days ago • 18