Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
BigScience Data
non-profit
https://bigscience.huggingface.co
Activity Feed
Request to join this org
Follow
140
AI & ML interests
None defined yet.
Recent Activity
thomwolf
authored
a paper
about 22 hours ago
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
lvwerra
authored
a paper
about 22 hours ago
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
craffel
authored
a paper
about 22 hours ago
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
View all activity
Team members
72
+38
+25
+4
bigscience-data
's models
8
Sort: Recently updated
bigscience-data/sgpt-bloom-1b7-nli
Sentence Similarity
•
2B
•
Updated
Jan 27
•
26
•
11
bigscience-data/tokenizer_alpha_NFKC_250k
Updated
Feb 17, 2022
bigscience-data/tokenizer_equal_NFKC_250k
Updated
Feb 16, 2022
bigscience-data/tokenizer_alpha_nfkc_24M
Updated
Feb 16, 2022
bigscience-data/tokenizer_equal_nfkc_24M
Updated
Feb 15, 2022
bigscience-data/tokenizer_equal_weight_NFKC_v1
Updated
Feb 14, 2022
bigscience-data/tokenizer_alpha_weight_NFKC
Updated
Feb 14, 2022
bigscience-data/tokenizer_v0
Updated
Feb 8, 2022