File size: 1,474 Bytes
34eb973 4c73846 6c51969 4c73846 6c51969 4c73846 6c51969 4c73846 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
language:
- ko
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- transformers
---
## PwC-Embedding-expr
We trained the **PwC-Embedding-expr** model on top of the [multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) embedding model.
To enhance performance in Korean, we applied our curated augmentation to STS datasets and fine-tuned the E5 model using a carefully balanced ratio across datasets.
> ⚠️ This is an experimental model and is under continuous development.
### To-do
- [x] MTEB Leaderboard
- [ ] Technical Report
## MTEB
PwC-Embedding_expr was evaluated on the Korean subset of MTEB.
A leaderboard link will be added once it is published.
| Task | PwC-Embedding_expr |
|------------------|--------------------|
| KLUE-STS | 0.88 |
| KLUE-TC | 0.73 |
| Ko-StrategyQA | 0.80 |
| KorSTS | 0.84 |
| MIRACL-Reranking | 0.72 |
| MIRACL-Retrieval | 0.65 |
| **Average** | **0.77** |
## Model
- Base Model: [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct)
- Model Size: 0.56B
- Embedding Dimension: 1024
- Max Input Tokens: 514
## Requirements
It works with the dependencies included in the latest version of MTEB.
## Citation
TBD (technical report expected September 2025) |