Generalist Foundation Models Are Not Clinical Enough for Hospital Operations Paper • 2511.13703 • Published Nov 17 • 21
SQuALITY: Building a Long-Document Summarization Dataset the Hard Way Paper • 2205.11465 • Published May 23, 2022 • 1
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models Paper • 2206.04615 • Published Jun 9, 2022 • 5
Training Language Models with Language Feedback at Scale Paper • 2303.16755 • Published Mar 28, 2023 • 1
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs Paper • 2309.07311 • Published Sep 13, 2023 • 4
BBQ: A Hand-Built Bias Benchmark for Question Answering Paper • 2110.08193 • Published Oct 15, 2021 • 1
QuALITY: Question Answering with Long Input Texts, Yes! Paper • 2112.08608 • Published Dec 16, 2021 • 3
Improving Code Generation by Training with Natural Language Feedback Paper • 2303.16749 • Published Mar 28, 2023 • 1
EvoPrompting: Language Models for Code-Level Neural Architecture Search Paper • 2302.14838 • Published Feb 28, 2023 • 1
Preference Learning Algorithms Do Not Learn Preference Rankings Paper • 2405.19534 • Published May 29, 2024 • 1