codelion
/

gpt-2-70m

@@ -60,9 +60,9 @@ This model demonstrates the effectiveness of careful dataset composition for eff
 The model was trained on **1 billion tokens** with the following composition:
-- **40%** - FinePDFs (400M tokens): High-quality PDF content
 - **30%** - DCLM Baseline (300M tokens): Filtered web content
-- **30%** - FineWeb-Edu (300M tokens): Educational web content
 This 50-30-20 mixing ratio was identified through systematic experimentation as optimal for balanced performance across multiple domains.

 The model was trained on **1 billion tokens** with the following composition:
+- **50%** - FinePDFs (500M tokens): High-quality PDF content
 - **30%** - DCLM Baseline (300M tokens): Filtered web content
+- **20%** - FineWeb-Edu (200M tokens): Educational web content
 This 50-30-20 mixing ratio was identified through systematic experimentation as optimal for balanced performance across multiple domains.