Instructions to use castorini/rank_vicuna_7b_v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use castorini/rank_vicuna_7b_v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="castorini/rank_vicuna_7b_v1")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("castorini/rank_vicuna_7b_v1")
model = AutoModelForCausalLM.from_pretrained("castorini/rank_vicuna_7b_v1")

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use castorini/rank_vicuna_7b_v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "castorini/rank_vicuna_7b_v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "castorini/rank_vicuna_7b_v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/castorini/rank_vicuna_7b_v1

SGLang

How to use castorini/rank_vicuna_7b_v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "castorini/rank_vicuna_7b_v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "castorini/rank_vicuna_7b_v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "castorini/rank_vicuna_7b_v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "castorini/rank_vicuna_7b_v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use castorini/rank_vicuna_7b_v1 with Docker Model Runner:
```
docker model run hf.co/castorini/rank_vicuna_7b_v1
```

RankVicuna Model Card

Model Details

RankVicuna is a chat assistant trained by fine-tuning Llama 2 on user-shared conversations collected from ShareGPT.

Developed by: Castorini
Model type: An auto-regressive language model based on the transformer architecture
License: Llama 2 Community License Agreement
Finetuned from base model: Llama 2

This specific model is a 7B variant and is trained with data augmentation.

Model Sources

Repository: https://github.com/castorini/rank_llm
Paper: https://arxiv.org/abs/2309.15088

Uses

The primary use of RankVicuna is research at the intersection of large language models and retrieval. The primary intended users of the model are researchers and hobbyists in natural language processing and information retrieval.

Training Details

RankVicuna is finetuned from lmsys/vicuna-7b-v1.5 with supervised instruction fine-tuning.

Evaluation

RankVicuna is currently evaluated on DL19/DL20. See more details in our paper.

Downloads last month: 811

Collection including castorini/rank_vicuna_7b_v1

RankLLM

Collection

Listwise Rerankers courtesy of castorini! • 11 items • Updated Sep 25, 2025 • 3

Papers for castorini/rank_vicuna_7b_v1

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

Paper • 2309.15088 • Published Sep 26, 2023

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 252