Spaces:
Runtime error
Runtime error
## Small print | |
<p style="background-color: #fff9f9; border: 1px solid #ff0000; padding: 10px;"> | |
Warning: This demo is highly experimental and not ready for production use. | |
</p> | |
This demo is a proof of concept for visualizing the semantic differences between two text documents. | |
The input documents may or may not be written in the same language. | |
In our paper, we evaluate three simple, unsupervised approaches based on BERT-like encoder models. | |
This demo implements the approaches `DiffAlign` and `DiffDel` using the model [ZurichNLP/unsup-simcse-xlm-roberta-base](https://huggingface.co/ZurichNLP/unsup-simcse-xlm-roberta-base). See the model tags for a list of the ~100 supported languages. | |
- `DiffAlign` aligns the words of the two documents using cosine similarity between the word embeddings (cf. [SimAlign](http://dx.doi.org/10.18653/v1/2020.findings-emnlp.147), [BERTScore](https://openreview.net/forum?id=SkeHuCVFDr)). Words with low similarity are highlighted. | |
- `DiffDel` calculates sentence similarity between the two input documents (cf. [SimCSE](http://dx.doi.org/10.18653/v1/2021.emnlp-main.552)). The algorithm highlights words whose deletion has a positive effect on the similarity score. | |
More resources: | |
- Paper: https://arxiv.org/abs/2305.13303 | |
- Code: https://github.com/ZurichNLP/recognizing-semantic-differences | |
## Citation | |
```bibtex | |
@inproceedings{vamvas-sennrich-2023-rsd, | |
title={Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents}, | |
author={Jannis Vamvas and Rico Sennrich}, | |
month = dec, | |
year = "2023", | |
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing", | |
address = "Singapore", | |
publisher = "Association for Computational Linguistics", | |
} | |
``` | |