Running 1.03k 1.03k FineWeb: decanting the web for the finest text data at scale 🍷 Generate high-quality web text data for LLM training
jed351/Traditional-Chinese-Common-Crawl-Filtered Viewer • Updated about 3 hours ago • 92.1M • 381 • 22