Update README.md
Browse files
README.md
CHANGED
|
@@ -167,6 +167,7 @@ model-index:
|
|
| 167 |
source:
|
| 168 |
url: https://huggingface.co/spaces/lmsys/mt-bench
|
| 169 |
---
|
|
|
|
| 170 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 171 |
should probably proofread and complete it, then remove this comment. -->
|
| 172 |
|
|
@@ -235,9 +236,12 @@ Here's how you can run the model using the `pipeline()` function from 🤗 Trans
|
|
| 235 |
# Install transformers from source - only needed for versions <= v4.34
|
| 236 |
# pip install git+https://github.com/huggingface/transformers.git
|
| 237 |
# pip install accelerate
|
|
|
|
| 238 |
import torch
|
| 239 |
from transformers import pipeline
|
|
|
|
| 240 |
pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
|
|
|
|
| 241 |
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
|
| 242 |
messages = [
|
| 243 |
{
|
|
@@ -299,6 +303,8 @@ The following hyperparameters were used during training:
|
|
| 299 |
### Training results
|
| 300 |
|
| 301 |
The table below shows the full set of DPO training metrics:
|
|
|
|
|
|
|
| 302 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
| 303 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
| 304 |
| 0.6284 | 0.05 | 100 | 0.6098 | 0.0425 | -0.1872 | 0.7344 | 0.2297 | -258.8416 | -253.8099 | -2.7976 | -2.8234 |
|
|
@@ -360,6 +366,7 @@ The table below shows the full set of DPO training metrics:
|
|
| 360 |
| 0.0094 | 2.94 | 5700 | 0.7527 | -4.5542 | -8.3509 | 0.7812 | 3.7967 | -340.4790 | -299.7773 | -2.3062 | -2.3510 |
|
| 361 |
| 0.0054 | 2.99 | 5800 | 0.7520 | -4.5169 | -8.3079 | 0.7812 | 3.7911 | -340.0493 | -299.4038 | -2.3081 | -2.3530 |
|
| 362 |
|
|
|
|
| 363 |
### Framework versions
|
| 364 |
|
| 365 |
- Transformers 4.35.0.dev0
|
|
@@ -370,6 +377,7 @@ The table below shows the full set of DPO training metrics:
|
|
| 370 |
## Citation
|
| 371 |
|
| 372 |
If you find Zephyr-7B-β is useful in your work, please cite it with:
|
|
|
|
| 373 |
```
|
| 374 |
@misc{tunstall2023zephyr,
|
| 375 |
title={Zephyr: Direct Distillation of LM Alignment},
|
|
@@ -382,6 +390,7 @@ If you find Zephyr-7B-β is useful in your work, please cite it with:
|
|
| 382 |
```
|
| 383 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
| 384 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta)
|
|
|
|
| 385 |
| Metric | Value |
|
| 386 |
|-----------------------|---------------------------|
|
| 387 |
| Avg. | 52.15 |
|
|
@@ -391,4 +400,4 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
|
|
| 391 |
| TruthfulQA (0-shot) | 57.45 |
|
| 392 |
| Winogrande (5-shot) | 77.74 |
|
| 393 |
| GSM8K (5-shot) | 12.74 |
|
| 394 |
-
| DROP (3-shot) | 9.66 |
|
|
|
|
| 167 |
source:
|
| 168 |
url: https://huggingface.co/spaces/lmsys/mt-bench
|
| 169 |
---
|
| 170 |
+
|
| 171 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 172 |
should probably proofread and complete it, then remove this comment. -->
|
| 173 |
|
|
|
|
| 236 |
# Install transformers from source - only needed for versions <= v4.34
|
| 237 |
# pip install git+https://github.com/huggingface/transformers.git
|
| 238 |
# pip install accelerate
|
| 239 |
+
|
| 240 |
import torch
|
| 241 |
from transformers import pipeline
|
| 242 |
+
|
| 243 |
pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
|
| 244 |
+
|
| 245 |
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
|
| 246 |
messages = [
|
| 247 |
{
|
|
|
|
| 303 |
### Training results
|
| 304 |
|
| 305 |
The table below shows the full set of DPO training metrics:
|
| 306 |
+
|
| 307 |
+
|
| 308 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
| 309 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
| 310 |
| 0.6284 | 0.05 | 100 | 0.6098 | 0.0425 | -0.1872 | 0.7344 | 0.2297 | -258.8416 | -253.8099 | -2.7976 | -2.8234 |
|
|
|
|
| 366 |
| 0.0094 | 2.94 | 5700 | 0.7527 | -4.5542 | -8.3509 | 0.7812 | 3.7967 | -340.4790 | -299.7773 | -2.3062 | -2.3510 |
|
| 367 |
| 0.0054 | 2.99 | 5800 | 0.7520 | -4.5169 | -8.3079 | 0.7812 | 3.7911 | -340.0493 | -299.4038 | -2.3081 | -2.3530 |
|
| 368 |
|
| 369 |
+
|
| 370 |
### Framework versions
|
| 371 |
|
| 372 |
- Transformers 4.35.0.dev0
|
|
|
|
| 377 |
## Citation
|
| 378 |
|
| 379 |
If you find Zephyr-7B-β is useful in your work, please cite it with:
|
| 380 |
+
|
| 381 |
```
|
| 382 |
@misc{tunstall2023zephyr,
|
| 383 |
title={Zephyr: Direct Distillation of LM Alignment},
|
|
|
|
| 390 |
```
|
| 391 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
| 392 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta)
|
| 393 |
+
|
| 394 |
| Metric | Value |
|
| 395 |
|-----------------------|---------------------------|
|
| 396 |
| Avg. | 52.15 |
|
|
|
|
| 400 |
| TruthfulQA (0-shot) | 57.45 |
|
| 401 |
| Winogrande (5-shot) | 77.74 |
|
| 402 |
| GSM8K (5-shot) | 12.74 |
|
| 403 |
+
| DROP (3-shot) | 9.66 |
|