Models in Adaptive Length Penalty Paper
AI & ML interests
None defined yet.
Recent Activity
View all activity
models
9
RLAIF/reward-model-grpo
0.8B
•
Updated
•
2
RLAIF/llama-3b-open-r1-50k-sft
4B
•
Updated
•
2
RLAIF/sft-external
Text Generation
•
8B
•
Updated
RLAIF/sft-llama-3.1-8b-external
Text Generation
•
8B
•
Updated
RLAIF/sft-gemma-2-9b-base-sft-llama-405b-instruct-correct-only-format-lr-5e-06-bs-64
Text Generation
•
9B
•
Updated
RLAIF/sft-llama8b-prm-800k-correct-only
Text Generation
•
8B
•
Updated
RLAIF/22-sequential-temp-0-verifier-no-best-oracle-in-context-train-8
8B
•
Updated
RLAIF/22-sequential-temp-0-verifier-oracle-in-context-train-8-w-error-masking
8B
•
Updated
RLAIF/15-w-error-masking-temp-0-verifier-in-context-train-in-context-inference-8-model
8B
•
Updated
•
2
datasets
59
RLAIF/val-grm
Viewer
•
Updated
•
2k
•
6
RLAIF/train-grm
Viewer
•
Updated
•
20k
•
6
RLAIF/val-policy-filtered
Viewer
•
Updated
•
3.49k
•
5
RLAIF/train-policy-filtered
Viewer
•
Updated
•
20k
•
6
RLAIF/multi-model-judge-comparison-question-view
Viewer
•
Updated
•
100
•
64
RLAIF/multi-model-judge-comparison-flat
Viewer
•
Updated
•
200
•
59
RLAIF/multi-model-judge-comparison-20250729-154625
Viewer
•
Updated
•
200
•
46
RLAIF/multi-model-judge-comparison
Viewer
•
Updated
•
200
•
48
RLAIF/genrm-uf-qwen3-4b-angel-judge-qwen-3-4b-jt07-j200-n200-20250729-192837
Viewer
•
Updated
•
200
•
46
RLAIF/genrm-uf-qwen3-4b-angel-judge-qwen-3-14b-jt07-j200-n200-20250729-192716
Viewer
•
Updated
•
200
•
43