Ring-lite

πŸ€— Hugging Face

Introduction

Ring-lite is a lightweight, fully open-sourced MoE (Mixture of Experts) LLM designed for complex reasoning tasks. It is built upon the publicly available Ling-lite-1.5 model, which has 16.8B parameters with 2.75B activated parameters.. We use a joint training pipeline combining knowledge distillation with reinforcement learning, achieving performance comparable to state-of-the-art (SOTA) small-size reasoning models on challenging benchmarks (AIME, LiveCodeBench, and GPQA-Diamond) while activating only one-third of their parameters.

News

[20250704] Ring-lite-0704: we update Ring-lite model, which supports two distinct reasoning modes: "thinking on" and "thinking off".

Model Downloads

Model #Total Params #Activated Params Context Length Download
Ring-lite 16.8B 2.75B 128K πŸ€— HuggingFace

Evaluation

For a comprehensive evaluation of the quality of our reasoning models, we implemented automatic benchmarks to assess their performance including math, code and science.

To compare the performance of Ring-lite-0704 and Ring-lite-0616, we evaluate the two models on a broader range of reasoning and general-purpose benchmarks, including instruction following, function calling, and creative writing.

Dataset Ring-lite-0616 Ring-lite-0704
AIME 2024 76.6 79.0
AIME 2025 69.1 69.5
LiveCodeBench 60.7 61.4
Codeforces (percentile) 86.5 88.0
GPQA Diamond 61.1 63.2
C-Eval 59.0 65.4
MMLU-Pro 60.0 63.0
ArenaHard 27.8 62.7
IF-Eval 51.6 54.3
BFCL_Live 60.1 66.8
Creative Writing 6.7 60.2

More details are reported in our technical report.

Quickstart

πŸ€— Hugging Face Transformers

The newly updated Ring-lite model now supports two distinct reasoning modes: "thinking on" and "thinking off". These modes are controlled by the enable_thinking parameter in the tokenizer.apply_chat_template() function.

  • When enable_thinking is set to True (or omitted), the model operates in "thinking on" mode, where it generates and outputs the internal reasoning process.
  • When enable_thinking is explicitly set to False, the model runs in "thinking off" mode, skipping the reasoning step entirely and directly producing the final answer.

This feature allows users to choose between detailed reasoning and concise output based on their specific needs.

Here is a code snippet to show you how to use the chat model with transformers:

Thinking on

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "inclusionAI/Ring-lite"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language models."
messages = [
    {"role": "system", "content": "You are Ring, an assistant created by inclusionAI"},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=8192
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Thinking off

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "inclusionAI/Ring-lite"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language models."
messages = [
    {"role": "system", "content": "You are Ring, an assistant created by inclusionAI"},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=8192
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Dataset

The training data of Ring-lite is release at Ring-lite-sft-data and Ring-lite-rl-data.

Code

The training code will be released soon.

Deployment

Please refer to GitHub

License

This code repository is licensed under the MIT License.

Citation

@misc{ringteam2025ringlitescalablereasoningc3postabilized,
      title={Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs}, 
      author={Ling Team},
      year={2025},
      eprint={2506.14731},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.14731}, 
}
Downloads last month
22
Safetensors
Model size
16.8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for inclusionAI/Ring-lite

Finetuned
(3)
this model
Quantizations
5 models

Collection including inclusionAI/Ring-lite