Instructions to use X-iZhang/libra-maira-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use X-iZhang/libra-maira-2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="X-iZhang/libra-maira-2", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("X-iZhang/libra-maira-2", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use X-iZhang/libra-maira-2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "X-iZhang/libra-maira-2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "X-iZhang/libra-maira-2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/X-iZhang/libra-maira-2

SGLang

How to use X-iZhang/libra-maira-2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "X-iZhang/libra-maira-2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "X-iZhang/libra-maira-2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "X-iZhang/libra-maira-2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "X-iZhang/libra-maira-2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use X-iZhang/libra-maira-2 with Docker Model Runner:
```
docker model run hf.co/X-iZhang/libra-maira-2
```

MAIRA-2 (finetuned from Vicuna-7B, RAD-DINO)

MAIRA-2 is a multimodal transformer designed for the generation of grounded or non-grounded radiology reports from chest X-rays. MAIRA-2 has been built for research purposes only and is being shared to facilitate comparison and further research.

📌 Note: For original model weights, refer to microsoft/maira-2.

📃 Original paper: MAIRA-2: Grounded Radiology Report Generation.

🔬 Experimental Usage in Libra's repo

This model checkpoint is intended for experimental use and can be tested directly within the Libra repository.

For better benchmarking, we recommend using the official test set from X-iZhang/MIMIC-CXR-RRG.

Key Modification

To enable the re-trained vision encoder during inference and to follow the MAIRA-2 behaviour — using feature_maps from the Dinov2Backbone (i.e., hidden states with LayerNorm applied, instead of raw hidden_states) — make sure to apply the following configuration:

"unfreeze_mm_vision_tower": true,
"use_maira_feature_norm": true

This setting is specifically designed for findings section generation from a single frontal view Chest X-ray.

It is not applicable to grounding tasks or settings involving multiple image inputs.

Use-case: Findings generation without grounding

❗️MAIRA-2 requires a strict Chat Template and must be manually provided.

# ✅ With clinical instruction
prompt_with_clinical = (
    "Provide a description of the findings in the radiology study in comparison to the prior frontal image. "
    "INDICATION: Dyspnea. TECHNIQUE: PA and lateral views of the chest. COMPARISON: None."
)

# ✅ Without clinical instruction — placeholders (INDICATION, TECHNIQUE, COMPARISON) must still be included
prompt_minimal = (
    "Provide a description of the findings in the radiology study in comparison to the prior frontal image. "
    "INDICATION: None. TECHNIQUE: None. COMPARISON: None."
)

# 🧪 Inference example following the official MAIRA-2 setup
from libra.eval import libra_eval

frontal_image_url = "https://openi.nlm.nih.gov/imgs/512/145/145/CXR145_IM-0290-1001.png"
model_path = "X-iZhang/libra-maira-2"

answer = libra_eval(
    model_path=model_path,
    image_file=[frontal_image_url],
    query=prompt_with_clinical,
    conv_mode="maira_2",
    temperature=0.0,         # Use greedy decoding
    max_new_tokens=300,
)

# ✅ Expected output
print(answer)
# > There is a large right pleural effusion.
# > No pneumothorax is identified.
# > There is no left pleural effusion.
# > There is no focal consolidation.
# > The cardiomediastinal silhouette is within normal limits.

📚 Learn More

For a deeper dive into the methodology, theoretical insights, and performance benchmarks of the Libra framework, please see the following resources:

🔗 Project Website: Libra v1.0
📄 Paper: arXiv:2411.19378
💻 Code Repository: X-iZhang/Libra (GitHub)
📷 Related Project: CCD – Clinical Change Detection; see technical details in the paper here.

Disclaimer

This implementation is intended strictly for research and benchmarking purposes. It is not validated for clinical use, and any application in real-world diagnosis or treatment is strongly discouraged.

If any use case is found to violate these intended purposes (e.g., clinical deployment, misleading medical claims), the maintainers reserve the right to remove related code, models, or access permissions without prior notice.