Spaces:

impresso-project
/

README

Running

App Files Files Community

maudehrmann commited on Jul 12

Commit

4983147

verified ·

1 Parent(s): fa40ddd

Update README

Browse files

Files changed (1) hide show

README.md +13 -8

README.md CHANGED Viewed

@@ -6,13 +6,18 @@ colorTo: indigo
 sdk: static
 pinned: false
 ---
-**Interdisciplinary ML‑powered platform for exploring historical periodical media.**
-- **📚 Corpus**: Aggregates an unprecedented multilingual archive of newspapers and radio across time and borders.
-- **🎯 Vision**: Enables a semantic-enriched workflow for representation, exploration, and historical research across modalities like print and audio.
-- **💡 Outputs**:
-  - Web App & Datalab platforms for exploratory analysis, search and programmatic access
-  - NLP resources: Language identificatino, OCR quality assessment, Named Entity Recognition, Named Entity Linking, topic models
-  - Historical insights under the theme of media influences.
-- **🧑‍🤝‍🧑 Hugging Face Organization** hosts multilingual NER, NEL, OCR‑quality assessment models, and Spaces for named entity processing

 sdk: static
 pinned: false
 ---
+Hi there 👋 !
+**Impresso - Media Monitoring of the Past** is an interdisciplinary research project that uses machine learning to pursue a paradigm shift in the processing, semantic enrichment, representation, exploration and study of historical media across modalities, temporal, linguistic, and national borders.
+We design and develop the 🚀 [Impresso Web App](https://impresso-project.ch/app/) and the upcoming 🔬 [Impresso Datalab](https://impresso-project.ch/datalab/) (coming soon), providing search, exploratory analysis, and programmatic access to an unprecedented corpus of multilingual historical newspapers and radio broadcasts collections. Our work sits at the intersection of Natural Language Processing, Design, and History. Learn more on the 🌐 [project website](https://impresso-project.ch).
+This HuggingFace organization hosts models and datasets developped by the project.
+- 🤖 **Impresso models** are specifically tailored for historical, multilingual documents and include language identification, OCR quality assessment, topic inference, NER and NEL.
+- 📚 **Impresso datasets** are curated collections derived from digitized historical media sources, designed to support ML development and evaluation. Datasets are currently in preparation and will soon be released, including a NER and NEL benchmark developed as part of the [HIPE evaluation campaign](https://hipe-eval.github.io/HIPE-2022/), an image type classification dataset (e.g., article vs. advertisement vs. illustration) and more.
+Our contributions aim to foster reuse and reproducibility in historical text analysis by providing documented and diverse assets, with clear provenance and - whenever possible- open licenses. Whether you are a researcher, developer, or cultural heritage professional, we hope these resources support your work.
+#### Associated Partners and Funding
+- Impresso is supported by cultural heritage 🏛️ [partners](https://impresso-project.ch/consortium/associated-partners/) who contribute not only their media collections but also their expertise in data curation, management, and research. We are grateful for their collaboration and continued support.
+- The project has received two rounds of funding: first, from 2017 to 2020, by the Swiss National Science Foundation (Grant No. [CRSII5_173719](https://data.snf.ch/grants/grant/173719)); and second, from 2023 to 2027, jointly by the Swiss National Science Foundation (Grant No. [CRSII5_213585](https://data.snf.ch/grants/grant/213585)) and the Luxembourg National Research Fund (Grant No. 17498891).