view article Article LeRobot Community Datasets: The “ImageNet” of Robotics — When and How? 3 days ago • 44
Hugging Face community’s Wikimedia datasets Collection Wikimedia datasets created by the Hugging Face community, not Wikimedia. Sorted by Wikimedia project. • 17 items • Updated Jun 7, 2024 • 11
SwallowMath Collection Rewriting Pre-Training Data Boosts LLM Performance in Math and Code • 11 items • Updated 7 days ago • 2
SwallowCode Collection Rewriting Pre-Training Data Boosts LLM Performance in Math and Code • 66 items • Updated 7 days ago • 2
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published 26 days ago • 89
EMOVA-Datasets Collection A collection of EMOVA datasets (https://emova-ollm.github.io/) • 6 items • Updated Mar 14 • 2
view article Article LeRobot goes to driving school: World’s largest open-source self-driving dataset Mar 11 • 79
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Feb 20 • 53