File size: 2,899 Bytes

cf3776f
 
65f9f58
 
 
 
 
 
 
 
cf3776f
 
65f9f58
cf3776f
65f9f58
cf3776f
 
 
 
 
65f9f58
cf3776f
65f9f58
 
 
 
 
 
cf3776f
65f9f58
cf3776f
65f9f58
 
 
 
 
 
 
cf3776f
 
 
 
 
65f9f58
cf3776f
65f9f58
cf3776f
65f9f58
 
 
 
 
 
 
 
 
 
cf3776f
65f9f58
cf3776f
65f9f58
 
 
 
 
 
cf3776f
65f9f58
cf3776f
65f9f58
 
 
cf3776f
65f9f58
cf3776f
65f9f58
cf3776f
65f9f58
 
 
cf3776f
65f9f58
 
 
 
 
 
 
cf3776f
65f9f58
 
 
cf3776f
 
65f9f58

---
library_name: transformers
datasets:
- DataSeer/si-summarization-votes-r1-081725
base_model: Qwen/Qwen3-32B
tags:
- lora
- supervised-fine-tuning
- summarization
- qwen3
---

# Qwen3-32B Summarization LoRA Adapter

A LoRA (Low-Rank Adaptation) fine-tuned adapter for the Qwen3-32B model, specifically trained for summarizing supplemental information for articles. We used multi-turn reinforcement learning based on the rollouts in the DataSeer summarization votes dataset (human preference data).

## Model Details

### Model Description

This adapter fine-tunes the Qwen3-32B base model for improved summarization capabilities using LoRA technique.

- **Developed by:** DataSeer
- **Model type:** Causal Language Model (LoRA Adapter)
- **Language:** English
- **Base model:** Qwen/Qwen3-32B
- **Training approach:** Multi-turn RL with LoRA
- **Dataset:** DataSeer/si-summarization-votes-r1-081725

### Model Architecture

- **Base Model:** Qwen3-32B (32.8B parameters)
- **LoRA Configuration:**
  - Rank (r): 8
  - Alpha: 32
  - Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
  - Dropout: 0
- **Precision:** bfloat16

## Training Details

### Training Data

The model was trained on the `DataSeer/si-summarization-votes-r1-081725` dataset, which contains summarization rollouts with annotator votes. The dataset was filtered to include only positively-voted examples (label=True).

### Training Configuration

- **Training epochs:** 2
- **Learning rate:** 1e-3 (0.001)
- **Batch size:** 1 per device
- **Gradient accumulation steps:** 8
- **Effective batch size:** 8
- **Learning rate scheduler:** Cosine
- **Optimizer:** AdamW (torch fused)
- **Precision:** bfloat16
- **Gradient checkpointing:** Enabled
- **Max sequence length:** 18,893 tokens

### Training Results

- **Final training loss:** 0.3414
- **Mean token accuracy:** 88.13%
- **Total training steps:** 62
- **Training runtime:** 37.9 minutes (2,273 seconds)
- **Training samples per second:** 0.216
- **Final learning rate:** 5.77e-6

### Hardware & Performance

- **Hardware:** 8x NVIDIA H100 80GB HBM3
- **Training time:** ~38 minutes
- **Memory optimization:** Gradient checkpointing, bfloat16 precision

## Usage

### Loading the Model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-32B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "path/to/adapter")
```


### Environmental Impact
Training was conducted on high-performance H100 GPUs for approximately 38 minutes, representing a relatively efficient fine-tuning process thanks to the LoRA approach which only trains ~0.1% of the total model parameters.