Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning Paper • 2304.03916 • Published Apr 8, 2023
Diversity of Thought Improves Reasoning Abilities of Large Language Models Paper • 2310.07088 • Published Oct 11, 2023 • 5
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models Paper • 2404.06209 • Published Apr 9, 2024 • 5
Eureka: Evaluating and Understanding Large Foundation Models Paper • 2409.10566 • Published Sep 13, 2024
BENCHAGENTS: Automated Benchmark Creation with Agent Interaction Paper • 2410.22584 • Published Oct 29, 2024 • 1
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead Paper • 2504.00294 • Published Mar 31 • 11
Introducing v0.5 of the AI Safety Benchmark from MLCommons Paper • 2404.12241 • Published Apr 18, 2024 • 12
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval Paper • 2310.15511 • Published Oct 24, 2023 • 5
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models Paper • 2309.15098 • Published Sep 26, 2023 • 7