Spaces:
Runtime error
Runtime error
## Small print | |
<p style="background-color: #fff9f9; border: 1px solid #ff0000; padding: 10px;"> | |
Warning: This demo is highly experimental and not ready for production use. | |
</p> | |
This demo is a proof of concept for visualizing the semantic differences between two text documents. | |
The input documents may or may not be written in the same language. | |
In our paper, we evaluate three simple, unsupervised approaches based on BERT-like encoder models. | |
This demo implements the approaches `DiffAlign` and `DiffDel` using the model [ZurichNLP/unsup-simcse-xlm-roberta-base](https://huggingface.co/ZurichNLP/unsup-simcse-xlm-roberta-base). See the model tags for a list of the ~100 supported languages. | |
- `DiffAlign` aligns the words of the two documents using cosine similarity between the word embeddings (cf. [SimAlign](http://dx.doi.org/10.18653/v1/2020.findings-emnlp.147), [BERTScore](https://openreview.net/forum?id=SkeHuCVFDr)). Words with low similarity are highlighted. | |
- `DiffDel` calculates sentence similarity between the two input documents (cf. [SimCSE](http://dx.doi.org/10.18653/v1/2021.emnlp-main.552)). The algorithm highlights words whose deletion has a positive effect on the similarity score. | |
More resources: | |
- Paper: https://arxiv.org/abs/2305.13303 | |
- Code: https://github.com/ZurichNLP/recognizing-semantic-differences | |
## Citation | |
```bibtex | |
@article{vamvas-sennrich-2023-rsd, | |
title={Towards Unsupervised Recognition of Semantic Differences in Related Documents}, | |
author={Jannis Vamvas and Rico Sennrich}, | |
year={2023}, | |
eprint={2305.13303}, | |
archivePrefix={arXiv}, | |
primaryClass={cs.CL} | |
} | |
``` | |