Daniel van Strien's picture

Daniel van Strien PRO

davanstrien

AI & ML interests

Machine Learning Librarian

Recent Activity

updated a dataset about 1 hour ago
librarian-bots/dataset_cards_with_metadata
updated a dataset about 3 hours ago
data-is-better-together/fineweb-c-progress
updated a dataset about 3 hours ago
librarian-bots/model_cards_with_metadata
View all activity

Organizations

Hugging Face's profile picture Notebooks-explorers's profile picture Nasjonalbiblioteket AI Lab's profile picture Living with Machines's profile picture BigScience Workshop's profile picture Spaces-explorers's profile picture BigScience Catalogue Data's profile picture Hacks/Hackers's profile picture BigScience: LMs for Historical Texts's profile picture flyswot's profile picture Webhooks Explorers (BETA)'s profile picture HuggingFaceM4's profile picture Open Access AI Collective's profile picture HF Canonical Model Maintainers's profile picture BigLAM: BigScience Libraries, Archives and Museums's profile picture Hugging Face OSS Metrics's profile picture ImageIN's profile picture Stable Diffusion Bias Eval's profile picture Librarian Bots's profile picture Blog-explorers's profile picture Hacktoberfest 2023's profile picture Hugging Face Smol Models Research's profile picture geospatial's profile picture HPLT's profile picture HF-IA-archiving's profile picture 2A2I Legacy Models & Datasets's profile picture testy's profile picture DIBT-for-Klingon's profile picture Wikimedia Movement's profile picture DIBT-for-Esperanto's profile picture Journalists on Hugging Face's profile picture PleIAs's profile picture Persian AI Community's profile picture πŸ€— FineData's profile picture Data Is Better Together's profile picture Social Post Explorers's profile picture OMOTO AI's profile picture academic-datasets's profile picture HuggingFaceFW-Dev's profile picture Hugging Face Discord Community's profile picture UCSF-JHU Opioid Industry Documents Archive's profile picture Dataset Tools's profile picture PDFPages's profile picture dibt-private's profile picture Data Is Better Together Contributor's profile picture Bluesky Community's profile picture Open R1's profile picture Reasoning datasets competition 's profile picture

davanstrien's activity

published an article 4 months ago
view article
Article

Explore, Curate and Vector Search Any Hugging Face Dataset with Nomic Atlas

By MaxNomic and 4 others β€’
β€’ 30
published an article 5 months ago
view article
Article

FineWeb2-C: Help Build Better Language Models in Your Language

By davanstrien and 5 others β€’
β€’ 19
published an article 5 months ago
view article
Article

Open Preference Dataset for Text-to-Image Generation by the πŸ€— Community

By davidberenstein1957 and 6 others β€’
β€’ 60
published an article 6 months ago
view article
Article

Let’s make a generation of amazing image generation models

By burtenshaw and 4 others β€’
β€’ 33
published an article 6 months ago
view article
Article

Share your open ML datasets on Hugging Face Hub!

By davanstrien and 3 others β€’
β€’ 28
published an article 7 months ago
view article
Article

Scaling AI-based Data Processing with Hugging Face + Dask

By scj13 and 3 others β€’
β€’ 30
published an article 11 months ago
view article
Article

Introducing Synthetic Data Workshop: Your Gateway to Easy Synthetic Dataset Creation

By davanstrien β€’
β€’ 12
published an article 11 months ago
view article
Article

Data Is Better Together: A Look Back and Forward

By sdiazlor and 2 others β€’
β€’ 20
published an article 12 months ago
view article
Article

Synthetic dataset generation techniques: generating custom sentence similarity data

By davanstrien β€’
β€’ 16
published an article 12 months ago
view article
Article

Synthetic dataset generation techniques: Self-Instruct

By davanstrien β€’
β€’ 16
published an article about 1 year ago
view article
Article

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?

By davanstrien β€’
β€’ 8
published an article about 1 year ago
view article
Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

By loubnabnl and 2 others β€’
β€’ 86
published an article about 1 year ago
published an article over 1 year ago
view article
Article

Extracting Insights from Model Cards Using Open Large Language Models

By davanstrien β€’
published an article over 1 year ago
view article
Article

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

By VictorSanh and 10 others β€’
β€’ 31
published an article almost 2 years ago
view article
Article

Huggy Lingo: Using Machine Learning to Improve Language Metadata on the Hugging Face Hub

By davanstrien β€’
β€’ 1
published an article almost 2 years ago
view article
Article

The Hugging Face Hub for Galleries, Libraries, Archives and Museums

By davanstrien β€’
β€’ 2
published an article almost 2 years ago
view article
Article

Introducing BERTopic Integration with Hugging Face Hub

By davanstrien and 1 other β€’
β€’ 9
published an article about 2 years ago