Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
3
Greg Lindahl
greglindahl
Follow
21world's profile picture
AGreenDCAnt's profile picture
thomwolf's profile picture
5 followers
·
3 following
wumpus
AI & ML interests
None yet
Recent Activity
published
a dataset
4 months ago
commoncrawl/host-index-testing-v2
authored
a paper
7 months ago
Towards Best Practices for Open Datasets for LLM Training
updated
a Space
9 months ago
commoncrawl/README
View all activity
Organizations
greglindahl
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
published
a dataset
4 months ago
commoncrawl/host-index-testing-v2
Updated
Apr 23
•
1
authored
a paper
7 months ago
Towards Best Practices for Open Datasets for LLM Training
Paper
•
2501.08365
•
Published
Jan 14
•
64
updated
a Space
9 months ago
Running
README
🌍
Explore Common Crawl's metadata and experimental datasets
updated
a dataset
10 months ago
commoncrawl/eot2024_hostlevel_logs
Viewer
•
Updated
Oct 9, 2024
•
271k
•
2
•
1
updated
a dataset
11 months ago
commoncrawl/citations
Viewer
•
Updated
Jul 9
•
9.18k
•
142
New activity in
commoncrawl/citations
11 months ago
Upload 2024.jsonl.gz
#2 opened 11 months ago by
greglindahl
updated
a dataset
about 1 year ago
commoncrawl/citations-annotated
Viewer
•
Updated
Jul 8
•
424
•
75
New activity in
commoncrawl/README
about 1 year ago
start a README
#1 opened about 1 year ago by
greglindahl
Load more