This collection contains models, datasets and spaces related to historic language models

BigLAM: BigScience Libraries, Archives and Museums
non-profit
AI & ML interests
🤗 Hugging Face x 🌸 BigScience initiative to create open source community resources for LAMs.
Recent Activity
View all activity
Organization Card
📚 BigLAM: Machine Learning for Libraries, Archives, and Museums
BigLAM is a community-driven initiative to build an open ecosystem of machine learning models, datasets, and tools for Libraries, Archives, and Museums (LAMs).
We aim to:
- 🗃️ Share machine-learning-ready datasets from LAMs via the Hugging Face Hub
- 🤖 Train and release open-source models for LAM-relevant tasks
- 🛠️ Develop tools and approaches tailored to LAM use cases
✨ Background
BigLAM began as a datasets hackathon within the BigScience 🌸 project, a large-scale, open NLP collaboration.
Our goal: make LAM datasets more discoverable and usable to support researchers, institutions, and ML practitioners working with cultural heritage data.
📂 What You'll Find
The BigLAM organization hosts:
- Datasets: image, text, and tabular data from and about libraries, archives, and museums
- Models: fine-tuned for tasks like:
- Art/historical image classification
- Document layout analysis and OCR
- Metadata quality assessment
- Named entity recognition in heritage texts
- Spaces: tools for interactive exploration and demonstration
🧩 Get Involved
We welcome contributions! You can:
- Use our datasets and models
- Join the discussion on GitHub
- Contribute your own tools or data
- Share your work using BigLAM resources
🌍 Why It Matters
Cultural heritage data is often underrepresented in machine learning. BigLAM helps address this by:
- Supporting inclusive and responsible AI
- Helping institutions experiment with ML for access, discovery, and preservation
- Ensuring that ML systems reflect diverse human knowledge and expression
- Developing tools and methods that work well with the unique formats, values, and needs of LAMs
Collections
2
Historic Newspaper Datasets on the Hub
-
The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content from 16 Million Historic Newspaper Pages in Chronicling America
Paper • 2005.01583 • Published • 2 -
bigscience-historical-texts/hipe2020
Updated • 30 • 3 -
bigscience-historical-texts/HIPE2020_sent-split
Updated • 26 -
biglam/bnl_newspapers1841-1879
Viewer • Updated • 631k • 71 • 2
models
6

biglam/historic-newspaper-illustrations-yolov11
Object Detection
•
Updated
•
10

biglam/medieval-manuscript-yolov11
Object Detection
•
Updated
•
3

biglam/detr-resnet-50_fine_tuned_loc-2023
Object Detection
•
Updated
•
11
•
2

biglam/detr-resnet-50_fine_tuned_nls_chapbooks
Object Detection
•
Updated
•
53
•
6

biglam/cultural_heritage_metadata_accuracy
Text Classification
•
Updated
•
5
•
3

biglam/autotrain-beyond-the-books
Text Classification
•
Updated
•
5
datasets
32
biglam/loc_beyond_words
Viewer
•
Updated
•
3.56k
•
336
•
9
biglam/europeana_newspapers
Viewer
•
Updated
•
11.9M
•
1.01k
•
49
biglam/european_art
Viewer
•
Updated
•
15.2k
•
884
•
16
biglam/bnl_newspapers1841-1879
Viewer
•
Updated
•
631k
•
71
•
2
biglam/hmd_newspapers
Viewer
•
Updated
•
3.07M
•
2.1k
•
9
biglam/blbooks-parquet
Viewer
•
Updated
•
14M
•
13.2k
•
7
biglam/on_the_books
Viewer
•
Updated
•
1.79k
•
99
•
1
biglam/cultural_heritage_metadata_accuracy
Viewer
•
Updated
•
101k
•
49
•
4
biglam/old_bailey_proceedings
Viewer
•
Updated
•
2.64k
•
149
•
4
biglam/atypical_animacy
Viewer
•
Updated
•
594
•
86
•
3