-
-
-
-
-
-
Inference Providers
Active filters:
trl
mradermacher/INVIDI_Gemma3_4b_finetunned-GGUF
4B
•
Updated
•
324
•
1
mradermacher/Qwen2.5-14B-Instruct-ultrafeedback-spin-iter1-RPO-GGUF
15B
•
Updated
•
259
•
1
limitedonly41/mistral7b_v3_4_categories
hesamation/Qwen3-8B-Base-bnb-4bit-FOL
Sandipan1976/gpt-oss-medical
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-iterdpo-iter2-RPO
Text Generation
•
0.0B
•
Updated
•
8
•
1
prabhakaran-ak7/Llama3-medFinetuned
Text Generation
•
Updated
•
18
•
1
mradermacher/Lumian2-VLR-7B-Thinking-GGUF
8B
•
Updated
•
1.88k
•
1
mradermacher/Lumian2-VLR-7B-Thinking-i1-GGUF
8B
•
Updated
•
2.1k
•
1
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter1-4k
Text Generation
•
0.0B
•
Updated
•
27
•
1
mradermacher/Qwen2.5-14B-Instruct-ultrafeedback-iterdpo-iter2-RPO-GGUF
15B
•
Updated
•
2.07k
•
1
Guilherme34/GPT-OSS-UNCENSORED_MAKING-20B
Updated
•
3
•
1
Omartificial-Intelligence-Space/gpt-oss-math-ar
Updated
•
13
•
1
mradermacher/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter1-4k-GGUF
15B
•
Updated
•
1.83k
•
1
Manishram/medgemma-brain-cancer-finetuned
Text Generation
•
Updated
•
7
•
1
vishprometa/clickhouse-qwen3-1.7b
HamedAdham/llama3-ner-sft-dpo
Updated
•
1
lewtun/dummy-trl-model
Reinforcement Learning
•
Updated
•
16
•
1
ybelkada/gpt-neo-125m-detox
Reinforcement Learning
•
Updated
•
128
ybelkada/gpt-neo-125m-detoxified-long-context
Reinforcement Learning
•
Updated
•
2
dshin/flan-t5-ppo
Reinforcement Learning
•
Updated
•
5
SummerSigh/T5-Base-Rule-Of-Thumb-RM
Reinforcement Learning
•
Updated
•
1
dshin/flan-t5-ppo-testing
Reinforcement Learning
•
Updated
•
1
•
1
SummerSigh/T5-Base-EvilPrompterRM
Reinforcement Learning
•
0.2B
•
Updated
•
29
dshin/flan-t5-ppo-testing-violation
Reinforcement Learning
•
Updated
•
1
dshin/flan-t5-ppo-user-b
Reinforcement Learning
•
Updated
•
1
dshin/flan-t5-ppo-user-h-use-violation
Reinforcement Learning
•
Updated
•
1
dshin/flan-t5-ppo-user-f-use-violation
Reinforcement Learning
•
Updated
•
1
dshin/flan-t5-ppo-user-e-use-violation
Reinforcement Learning
•
Updated
•
2
dshin/flan-t5-ppo-user-a-use-violation
Reinforcement Learning
•
Updated
•
1