maudehrmann commited on
Commit
4983147
ยท
verified ยท
1 Parent(s): fa40ddd

Update README

Browse files
Files changed (1) hide show
  1. README.md +13 -8
README.md CHANGED
@@ -6,13 +6,18 @@ colorTo: indigo
6
  sdk: static
7
  pinned: false
8
  ---
 
9
 
10
- **Interdisciplinary MLโ€‘powered platform for exploring historical periodical media.**
11
 
12
- - **๐Ÿ“š Corpus**: Aggregates an unprecedented multilingual archive of newspapers and radio across time and borders.
13
- - **๐ŸŽฏ Vision**: Enables a semantic-enriched workflow for representation, exploration, and historical research across modalities like print and audio.
14
- - **๐Ÿ’ก Outputs**:
15
- - Web App & Datalab platforms for exploratory analysis, search and programmatic access
16
- - NLP resources: Language identificatino, OCR quality assessment, Named Entity Recognition, Named Entity Linking, topic models
17
- - Historical insights under the theme of media influences.
18
- - **๐Ÿง‘โ€๐Ÿคโ€๐Ÿง‘ Hugging Face Organization** hosts multilingual NER, NEL, OCRโ€‘quality assessment models, and Spaces for named entity processing
 
 
 
 
 
6
  sdk: static
7
  pinned: false
8
  ---
9
+ Hi there ๐Ÿ‘‹ !
10
 
11
+ **Impresso - Media Monitoring of the Past** is an interdisciplinary research project that uses machine learning to pursue a paradigm shift in the processing, semantic enrichment, representation, exploration and study of historical media across modalities, temporal, linguistic, and national borders.
12
 
13
+ We design and develop the ๐Ÿš€ [Impresso Web App](https://impresso-project.ch/app/) and the upcoming ๐Ÿ”ฌ [Impresso Datalab](https://impresso-project.ch/datalab/) (coming soon), providing search, exploratory analysis, and programmatic access to an unprecedented corpus of multilingual historical newspapers and radio broadcasts collections. Our work sits at the intersection of Natural Language Processing, Design, and History. Learn more on the ๐ŸŒ [project website](https://impresso-project.ch).
14
+
15
+ This HuggingFace organization hosts models and datasets developped by the project.
16
+ - ๐Ÿค– **Impresso models** are specifically tailored for historical, multilingual documents and include language identification, OCR quality assessment, topic inference, NER and NEL.
17
+ - ๐Ÿ“š **Impresso datasets** are curated collections derived from digitized historical media sources, designed to support ML development and evaluation. Datasets are currently in preparation and will soon be released, including a NER and NEL benchmark developed as part of the [HIPE evaluation campaign](https://hipe-eval.github.io/HIPE-2022/), an image type classification dataset (e.g., article vs. advertisement vs. illustration) and more.
18
+
19
+ Our contributions aim to foster reuse and reproducibility in historical text analysis by providing documented and diverse assets, with clear provenance and - whenever possible- open licenses. Whether you are a researcher, developer, or cultural heritage professional, we hope these resources support your work.
20
+
21
+ #### Associated Partners and Funding
22
+ - Impresso is supported by cultural heritage ๐Ÿ›๏ธ [partners](https://impresso-project.ch/consortium/associated-partners/) who contribute not only their media collections but also their expertise in data curation, management, and research. We are grateful for their collaboration and continued support.
23
+ - The project has received two rounds of funding: first, from 2017 to 2020, by the Swiss National Science Foundation (Grant No. [CRSII5_173719](https://data.snf.ch/grants/grant/173719)); and second, from 2023 to 2027, jointly by the Swiss National Science Foundation (Grant No. [CRSII5_213585](https://data.snf.ch/grants/grant/213585)) and the Luxembourg National Research Fund (Grant No. 17498891).