metadata

base_model:
  - Qwen/Qwen2.5-32B-Instruct
datasets:
  - liuwenhan/reasonrank_data_sft
  - liuwenhan/reasonrank_data_rl
  - liuwenhan/reasonrank_data_13k
language:
  - en
license: mit
pipeline_tag: text-ranking
library_name: transformers
tags:
  - reranking
  - reasoning
  - qwen

ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

Introduction

This is the model trained in our paper: ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability (📝arXiv).

Large Language Model (LLM) based listwise ranking has shown superior performance in many passage ranking tasks. With the development of Large Reasoning Models, many studies have demonstrated that step-by-step reasoning during test-time helps improve listwise ranking performance. ReasonRank addresses the scarcity of reasoning-intensive training data by proposing an automated reasoning-intensive training data synthesis framework. To empower the listwise reranker with strong reasoning ability, we further propose a two-stage post-training approach, which includes a cold-start supervised fine-tuning (SFT) stage for reasoning pattern learning and a reinforcement learning (RL) stage for further ranking ability enhancement.

Please refer to our 🧩GitHub repository for detailed usage instructions and code.

Project page: https://brightbenchmark.github.io/

Model Performance

Sample Usage

You can use this model with the transformers library. Here is a basic example to perform inference. Note that the exact prompt construction for ReasonRank is critical for performance and should ideally follow the create_prompt function in the original GitHub repository's rerank/rank_listwise_os_llm.py file. The example below provides a simplified structure for demonstration.

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

# Load the model and tokenizer
model_id = "liuwenhan/reasonrank-32B" # Assuming this is the model being documented
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16, # or torch.float16 depending on your GPU and needs
    device_map="auto",
    trust_remote_code=True # Required for custom modeling files (Qwen components)
).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

# Example query and passages
query = "What is the capital of France?"
passages = [
    "Paris is the capital and most populous city of France.",
    "London is the capital of England and the United Kingdom.",
    "The Eiffel Tower is a famous landmark in Paris.",
    "France is a country in Western Europe."
]

# Construct the input messages for Qwen's chat template.
# For ReasonRank's specific prompt structure, refer to the original GitHub repository's
# `rerank/rank_listwise_os_llm.py` file and `add_prefix_prompt`/`add_post_prompt` functions.
# This example uses a general Qwen-like structure for demonstration.
system_prompt = "You are a helpful and intelligent assistant."
user_prefix = f"For the query: '{query}', please rank the following passages from most relevant to least relevant.\
"
passage_list_str = "\
".join([f"[{i+1}] {p}" for i, p in enumerate(passages)])
user_suffix = "\
Now, please generate the reasoning process and the ranked list of passages."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": f"{user_prefix}{passage_list_str}{user_suffix}"}
]

# Apply the chat template to get the final prompt string
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Tokenize the input
inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)

# Generate response
# Use generation_config from the model if available, otherwise define
generation_config = model.generation_config if model.generation_config else GenerationConfig()
generation_config.max_new_tokens = 512
generation_config.do_sample = False # For greedy decoding
generation_config.temperature = 0.1 # Keep temperature low for ranking tasks
generation_config.top_p = 0.95


with torch.no_grad():
    outputs = model.generate(
        **inputs,
        generation_config=generation_config
    )

# Decode the output
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(f"Query: {query}\
Response:\
{response}")

# Expected (simplified) output might look like:
# Response:
# Reasoning: The query asks for the capital of France. Passage [1] directly states "Paris is the capital and most populous city of France."
# This makes it the most relevant. Other passages are less direct or irrelevant.
# Ranked List:
# 1. [1] Paris is the capital and most populous city of France.
# 2. [3] The Eiffel Tower is a famous landmark in Paris.
# 3. [4] France is a country in Western Europe.
# 4. [2] London is the capital of England and the United Kingdom.

Citation

If you find this work helpful, please cite our papers:

@misc{liu2025reasonrankempoweringpassageranking,
      title={ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability}, 
      author={Wenhan Liu and Xinyu Ma and Weiwei Sun and Yutao Zhu and Yuchen Li and Dawei Yin and Zhicheng Dou},
      year={2025},
      eprint={2508.07050},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2508.07050}, 
}

License

This project is released under the MIT License.

Acknowledgement

The inference codes and training implementation build upon RankLLM, Llama Factory and verl. Our work is based on the Qwen2.5 model series, and we sincerely thank the Qwen team for their outstanding contributions to the open-source community.