Commit History

Upload from GitHub Actions: Add auto-translated datasets
68a93b5
Running
verified

davidpomerenke commited on

Upload from GitHub Actions: Merge pull request #18 from datenlabor-bmz/pr-17
a0d1624
verified

davidpomerenke commited on

Upload from GitHub Actions: Add auto-translated datasets
c790fdb
verified

davidpomerenke commited on

Upload from GitHub Actions: ran full evaluation locally
088f96f
verified

davidpomerenke commited on

Upload from GitHub Actions: minor chashing change
b39df3c
verified

davidpomerenke commited on

Upload from GitHub Actions: updated and cleaned up scripts for new eval runs
963cb78
verified

davidpomerenke commited on

Upload from GitHub Actions: Update models.py, models.json, and results.json with latest evaluation data and model additions
8eebb41
verified

davidpomerenke commited on

Upload from GitHub Actions: Add Todos for using existing machine-translated datasets rather than our own ones
56adaa2
verified

davidpomerenke commited on

Upload from GitHub Actions: updated translation functions
8f5ce26
verified

davidpomerenke commited on

Upload from GitHub Actions: import flexibility on backend
b8cbeff
verified

davidpomerenke commited on

Upload from GitHub Actions: fixed import error
0a30811
verified

davidpomerenke commited on

Upload from GitHub Actions: updated frontend and backend to fix bugs
4e8cb1a
verified

davidpomerenke commited on

Upload from GitHub Actions: Merge pull request #13 from datenlabor-bmz/jn-dev
80d21cb
verified

davidpomerenke commited on

Upload from GitHub Actions: Merge pull request #10 from datenlabor-bmz/jn-dev
c2eeeac
verified

davidpomerenke commited on

Upload from GitHub Actions: updated batch size and delay
02f927b
verified

davidpomerenke commited on

Upload from GitHub Actions: updated workflow settings
e51c770
verified

davidpomerenke commited on

Upload from GitHub Actions: Merge pull request #9 from datenlabor-bmz/jn-dev
7c06aef
verified

davidpomerenke commited on

Upload from GitHub Actions: Merge pull request #7 from datenlabor-bmz/jn-dev
6878a71
verified

davidpomerenke commited on

Upload from GitHub Actions: Merge pull request #6 from datenlabor-bmz/jn-dev
6234f5c
verified

davidpomerenke commited on

Upload from GitHub Actions: Exclude TruthfulQA from proficiency score
3fbff09
verified

davidpomerenke commited on

Upload from GitHub Actions: TruthfulQA translation WIP
fd102e9
verified

davidpomerenke commited on

Upload from GitHub Actions: Scatterplot
353f761
verified

davidpomerenke commited on

Upload from GitHub Actions: Get more results, compute average based on all tasks
98c6811
verified

davidpomerenke commited on

Upload from GitHub Actions: Translate MMLU and evaluate
4c5c136
verified

davidpomerenke commited on

Upload from GitHub Actions: Correlation plot
b0aa389
verified

davidpomerenke commited on

Upload from GitHub Actions: Evaluate on autotranslated GSM dataset
f3a09a2
verified

davidpomerenke commited on

Upload from GitHub Actions: Evaluate Google Translate
338dc9b
verified

davidpomerenke commited on

Upload from GitHub Actions: More models and languages
a73f888
verified

davidpomerenke commited on

Upload from GitHub Actions: Improve UX and style
53d2039
verified

davidpomerenke commited on

Upload from GitHub Actions: Merge remote changes and apply terminology updates: Commercial->closed-source, Open->open-source
ebaf279
verified

davidpomerenke commited on

Upload from GitHub Actions: Use task subset for average score
b1e5b40
verified

davidpomerenke commited on

Upload from GitHub Actions: Eavaluate on 40 languages
941d5c5
verified

davidpomerenke commited on

Upload from GitHub Actions: Add math benchmarks
549360a
verified

davidpomerenke commited on

Upload from GitHub Actions: More results
52abc5b
verified

davidpomerenke commited on

Upload from GitHub Actions: Update model ranking fetching
f840423
verified

davidpomerenke commited on

Upload from GitHub Actions: Use FLORES+ via Huggingface
913253a
verified

davidpomerenke commited on

Upload from GitHub Actions: Quick fixes
9c2c019
verified

davidpomerenke commited on

Upload from GitHub Actions: More models
0bd935e
verified

davidpomerenke commited on

Upload from GitHub Actions: Increase n_models
d09b095
verified

davidpomerenke commited on

Upload from GitHub Actions: New results
b311dd5
verified

davidpomerenke commited on

Upload from GitHub Actions: Merge pull request #4 from datenlabor-bmz/jonas-dev
7c6a118
verified

davidpomerenke commited on

Upload from GitHub Actions: Fix vibecoding
75010c2
verified

davidpomerenke commited on

Upload from GitHub Actions: Ugly fix for CI errors
adc94d7
verified

davidpomerenke commited on

Upload from GitHub Actions: Try moving `cache` calls that cause CI issues
bc4afa0
verified

davidpomerenke commited on

Upload from GitHub Actions: Exclude free models from evals
c9e9db6
verified

davidpomerenke commited on

Upload from GitHub Actions: Display N/A scores as such
1e8952a
verified

davidpomerenke commited on

Block gemini-2.5-pro-exp-03-25
092c06a

David Pomerenke commited on

Pass through kwargs
5fa433f

David Pomerenke commited on

Fix dataset loading
c990cb9

David Pomerenke commited on

Temporarily disable classification task
a48ff53

David Pomerenke commited on