wangyuxin
commited on
Commit
·
4495f9a
1
Parent(s):
8446117
add 文本分类评测
Browse files
README.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
# M3E Models
|
| 2 |
|
|
|
|
|
|
|
| 3 |
M3E 是 Moka Massive Mixed Embedding 的缩写
|
| 4 |
|
| 5 |
* Moka,此文本嵌入模型由 MokaAI 训练并开源,训练脚本使用 [uniem](https://github.com/wangyuxinwhy/uniem/blob/main/scripts/train_m3e.py)
|
|
@@ -56,7 +58,28 @@ M3E 使用 in-batch 负采样的对比学习的方式在句对数据集进行训
|
|
| 56 |
|
| 57 |
## 评测
|
| 58 |
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
## M3E数据集
|
| 62 |
|
|
|
|
| 1 |
# M3E Models
|
| 2 |
|
| 3 |
+
[m3e-small](https://huggingface.co/moka-ai/m3e-small) | [m3e-base](https://huggingface.co/moka-ai/m3e-base)
|
| 4 |
+
|
| 5 |
M3E 是 Moka Massive Mixed Embedding 的缩写
|
| 6 |
|
| 7 |
* Moka,此文本嵌入模型由 MokaAI 训练并开源,训练脚本使用 [uniem](https://github.com/wangyuxinwhy/uniem/blob/main/scripts/train_m3e.py)
|
|
|
|
| 58 |
|
| 59 |
## 评测
|
| 60 |
|
| 61 |
+
### 文本分类
|
| 62 |
+
|
| 63 |
+
- 数据集选择,选择开源在 HuggingFace 上的 6 种文本分类数据集,包括新闻、电商评论、股票评论、长文本等
|
| 64 |
+
- 评测方式,使用 MTEB 的方式进行评测,报告 Accuracy。
|
| 65 |
+
- 评测模型,[text2vec](https://github.com/shibing624/text2vec), m3e-base, m3e-small, openai-ada-002
|
| 66 |
+
- 评测脚本,具体参考此 [评测脚本](https://github.com/wangyuxinwhy/uniem/blob/main/mteb-zh/tasks.py)
|
| 67 |
+
|
| 68 |
+
| | text2vec | m3e-small | m3e-base |
|
| 69 |
+
| ----------------- | -------- | --------- | -------- |
|
| 70 |
+
| TNews | 0.43 | 0.4443 | 0.4827 |
|
| 71 |
+
| JDIphone | 0.8214 | 0.8293 | 0.8533 |
|
| 72 |
+
| GubaEastmony | 0.7472 | 0.712 | 0.7621 |
|
| 73 |
+
| TYQSentiment | 0.6099 | 0.6596 | 0.7188 |
|
| 74 |
+
| StockComSentiment | 0.4307 | 0.4291 | 0.4363 |
|
| 75 |
+
| IFlyTek | 0.414 | 0.4263 | 0.4409 |
|
| 76 |
+
| Average | 0.5755 | 0.5834 | 0.6157 |
|
| 77 |
+
|
| 78 |
+
openai-ada-002 模型待评测
|
| 79 |
+
|
| 80 |
+
### 检索排序
|
| 81 |
+
|
| 82 |
+
更多任务,敬请期待
|
| 83 |
|
| 84 |
## M3E数据集
|
| 85 |
|