BigLAM: BigScience Libraries, Archives and Museums

non-profit

AI & ML interests

🤗 Hugging Face x 🌸 BigScience initiative to create open source community resources for LAMs.

Recent Activity

📚 BigLAM: Machine Learning for Libraries, Archives, and Museums

BigLAM is a community-driven initiative to build an open ecosystem of machine learning models, datasets, and tools for Libraries, Archives, and Museums (LAMs).

We aim to:

  • 🗃️ Share machine-learning-ready datasets from LAMs via the Hugging Face Hub
  • 🤖 Train and release open-source models for LAM-relevant tasks
  • 🛠️ Develop tools and approaches tailored to LAM use cases

✨ Background

BigLAM began as a datasets hackathon within the BigScience 🌸 project, a large-scale, open NLP collaboration.

Our goal: make LAM datasets more discoverable and usable to support researchers, institutions, and ML practitioners working with cultural heritage data.

📂 What You'll Find

The BigLAM organization hosts:

  • Datasets: image, text, and tabular data from and about libraries, archives, and museums
  • Models: fine-tuned for tasks like:
    • Art/historical image classification
    • Document layout analysis and OCR
    • Metadata quality assessment
    • Named entity recognition in heritage texts
  • Spaces: tools for interactive exploration and demonstration
🧩 Get Involved

We welcome contributions! You can:

  • Use our datasets and models
  • Join the discussion on GitHub
  • Contribute your own tools or data
  • Share your work using BigLAM resources

🌍 Why It Matters

Cultural heritage data is often underrepresented in machine learning. BigLAM helps address this by:

  • Supporting inclusive and responsible AI
  • Helping institutions experiment with ML for access, discovery, and preservation
  • Ensuring that ML systems reflect diverse human knowledge and expression
  • Developing tools and methods that work well with the unique formats, values, and needs of LAMs