metadata
language:
- ko
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- transformers
PwC-Embedding-expr
We trained the PwC-Embedding-expr model on top of the multilingual-e5-large-instruct embedding model.
To enhance performance in Korean, we applied our curated augmentation to STS datasets and fine-tuned the E5 model using a carefully balanced ratio across datasets.
To-do
- MTEB Leaderboard
- Technical Report
MTEB
PwC-Embedding_expr was evaluated on the Korean subset of MTEB.
A leaderboard link will be added once it is published.
Task | PwC-Embedding_expr | multilingual-e5-large | Max Result |
---|---|---|---|
KLUE-STS | 0.88 | 0.83 | 0.90 |
KLUE-TC | 0.73 | 0.61 | 0.73 |
Ko-StrategyQA | 0.80 | 0.80 | 0.83 |
KorSTS | 0.84 | 0.81 | 0.98 |
MIRACL-Reranking | 0.72 | 0.65 | 0.72 |
MIRACL-Retrieval | 0.65 | 0.59 | 0.72 |
Average | 0.77 | 0.71 | 0.81 |
Model
- Base Model: intfloat/multilingual-e5-large-instruct
- Model Size: 0.56B
- Embedding Dimension: 1024
- Max Input Tokens: 514
Requirements
It works with the dependencies included in the latest version of MTEB.
Citation
TBD (technical report expected September 2025)