dreamerdeo commited on
Commit
5bc9326
·
verified ·
1 Parent(s): cc8a018

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -36,8 +36,8 @@ Read more details about Sailor2 at https://sea-sailor.github.io/blog/sailor2/.
36
  <b><font size="+1">📚 Sailor2 Pre-training Dataset </font></b>
37
  </summary>
38
 
39
- - [Sailor2-pretrain-data-stage1](https://huggingface.co/datasets/sailor2/sailor2-pretrain-data-stage1): 500B high quality data for model training
40
- - [Sailor2-pretrain-data-stage2](https://huggingface.co/datasets/sailor2/sailor2-pretrain-data-stage2): 50B extra high quality data for model annealing
41
  - [sea-commoncrawl](https://huggingface.co/datasets/sailor2/sea-commoncrawl): Cleaned and deduplicated commoncrawl
42
  - [sea-internet](https://huggingface.co/datasets/sailor2/sea-internet): Cleaned multilingual data from Internet Archive
43
  - [sea-pdf-text](https://huggingface.co/datasets/sailor2/sea-pdf-text): Cleaned pdf data
 
36
  <b><font size="+1">📚 Sailor2 Pre-training Dataset </font></b>
37
  </summary>
38
 
39
+ - [Sailor2-pretrain-data-stage1](https://huggingface.co/datasets/sailor2/sailor2-pretrain-data-stage1): 450B high quality data for model training
40
+ - [Sailor2-pretrain-data-stage2](https://huggingface.co/datasets/sailor2/sailor2-pretrain-data-stage2): 60B extra high quality data for model annealing
41
  - [sea-commoncrawl](https://huggingface.co/datasets/sailor2/sea-commoncrawl): Cleaned and deduplicated commoncrawl
42
  - [sea-internet](https://huggingface.co/datasets/sailor2/sea-internet): Cleaned multilingual data from Internet Archive
43
  - [sea-pdf-text](https://huggingface.co/datasets/sailor2/sea-pdf-text): Cleaned pdf data