Edit Models filters

Apps

Docker Model Runner

Inference Providers

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

95,940

Full-text search

Active filters: trl

mradermacher/INVIDI_Gemma3_4b_finetunned-GGUF

4B • Updated 5 days ago • 324 • 1

mradermacher/Qwen2.5-14B-Instruct-ultrafeedback-spin-iter1-RPO-GGUF

15B • Updated 3 days ago • 259 • 1

limitedonly41/mistral7b_v3_4_categories

Updated 5 days ago • 1

hesamation/Qwen3-8B-Base-bnb-4bit-FOL

Updated 5 days ago • 1

Sandipan1976/gpt-oss-medical

Updated 4 days ago • 1

AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-iterdpo-iter2-RPO

Text Generation • 0.0B • Updated 4 days ago • 8 • 1

prabhakaran-ak7/Llama3-medFinetuned

Text Generation • Updated 3 days ago • 18 • 1

mradermacher/Lumian2-VLR-7B-Thinking-GGUF

8B • Updated 3 days ago • 1.88k • 1

mradermacher/Lumian2-VLR-7B-Thinking-i1-GGUF

8B • Updated 3 days ago • 2.1k • 1

AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter1-4k

Text Generation • 0.0B • Updated 3 days ago • 27 • 1

mradermacher/Qwen2.5-14B-Instruct-ultrafeedback-iterdpo-iter2-RPO-GGUF

15B • Updated 3 days ago • 2.07k • 1

Guilherme34/GPT-OSS-UNCENSORED_MAKING-20B

Updated 3 days ago • 3 • 1

Omartificial-Intelligence-Space/gpt-oss-math-ar

Updated 2 days ago • 13 • 1

mradermacher/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter1-4k-GGUF

15B • Updated 2 days ago • 1.83k • 1

Manishram/medgemma-brain-cancer-finetuned

Text Generation • Updated 2 days ago • 7 • 1

vishprometa/clickhouse-qwen3-1.7b

Updated 1 day ago • 1

HamedAdham/llama3-ner-sft-dpo

Updated about 4 hours ago • 1

lewtun/dummy-trl-model

Reinforcement Learning • Updated Jan 24, 2023 • 16 • 1

ybelkada/gpt-neo-125m-detox

Reinforcement Learning • Updated Feb 17, 2023 • 128

ybelkada/gpt-neo-125m-detoxified-long-context

Reinforcement Learning • Updated Feb 17, 2023 • 2

dshin/flan-t5-ppo

Reinforcement Learning • Updated Mar 11, 2023 • 5

SummerSigh/T5-Base-Rule-Of-Thumb-RM

Reinforcement Learning • Updated Mar 12, 2023 • 1

dshin/flan-t5-ppo-testing

Reinforcement Learning • Updated Mar 12, 2023 • 1 • 1

SummerSigh/T5-Base-EvilPrompterRM

Reinforcement Learning • 0.2B • Updated Mar 18, 2023 • 29

dshin/flan-t5-ppo-testing-violation

Reinforcement Learning • Updated Mar 12, 2023 • 1

dshin/flan-t5-ppo-user-b

Reinforcement Learning • Updated Mar 12, 2023 • 1

dshin/flan-t5-ppo-user-h-use-violation

Reinforcement Learning • Updated Mar 13, 2023 • 1

dshin/flan-t5-ppo-user-f-use-violation

Reinforcement Learning • Updated Mar 13, 2023 • 1

dshin/flan-t5-ppo-user-e-use-violation

Reinforcement Learning • Updated Mar 13, 2023 • 2

dshin/flan-t5-ppo-user-a-use-violation

Reinforcement Learning • Updated Mar 13, 2023 • 1