OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Paper • 2306.16527 • Published Jun 21, 2023 • 46
FinGPT: Large Generative Models for a Small Language Paper • 2311.05640 • Published Nov 3, 2023 • 32
What Language Model to Train if You Have One Million GPU Hours? Paper • 2210.15424 • Published Oct 27, 2022 • 2
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset Paper • 2303.03915 • Published Mar 7, 2023 • 7
Crosslingual Generalization through Multitask Finetuning Paper • 2211.01786 • Published Nov 3, 2022 • 2
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 32
Multitask Prompted Training Enables Zero-Shot Task Generalization Paper • 2110.08207 • Published Oct 15, 2021 • 2