Spaces:
Running
on
CPU Upgrade
Top-performing model has incorrect benchmark results.
The top-performing model on this leaderboard (Metin/Gemma-2-9b-it-TR-DPO-V1) has incorrect benchmark results and therefore needs to be corrected.
On the model page itself on HF, the results are different.
Reference: https://huggingface.co/Metin/Gemma-2-9b-it-TR-DPO-V1/blob/main/README.md
On another Turkish-fine-tuned model page, the benchmark results are also consistent with the results on the model page of Metin/Gemma-2-9b-it-TR-DPO-V1 model.
Reference: https://huggingface.co/WiroAI/wiroai-turkish-llm-9b#benchmark-scores
I've also tested it myself on Google Colab using A100 GPU using truthful_qa-v0.2 benchmark and my result is also consistent with the two above.
Therefore, the benchmark results for this model needs to be updated as soon as possible.