Spaces:
Paused
Paused
# BCO Trainer | |
[](https://huggingface.co/models?other=bco,trl) | |
TRL supports the Binary Classifier Optimization (BCO). | |
The [BCO](https://huggingface.co/papers/2404.04656) authors train a binary classifier whose logit serves as a reward so that the classifier maps {prompt, chosen completion} pairs to 1 and {prompt, rejected completion} pairs to 0. | |
For a full example have a look at [`examples/scripts/bco.py`]. | |
## Expected dataset type | |
The [`BCOTrainer`] requires an [unpaired preference dataset](dataset_formats#unpaired-preference). | |
The [`BCOTrainer`] supports both [conversational](dataset_formats#conversational) and [standard](dataset_formats#standard) dataset format. When provided with a conversational dataset, the trainer will automatically apply the chat template to the dataset. | |
## Expected model format | |
The BCO trainer expects a model of `AutoModelForCausalLM`, compared to PPO that expects `AutoModelForCausalLMWithValueHead` for the value function. | |
## Using the `BCOTrainer` | |
For a detailed example have a look at the `examples/scripts/bco.py` script. At a high level we need to initialize the `BCOTrainer` with a `model` we wish to train and a reference `ref_model` which we will use to calculate the implicit rewards of the preferred and rejected response. | |
The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder). | |
```py | |
training_args = BCOConfig( | |
beta=0.1, | |
) | |
bco_trainer = BCOTrainer( | |
model, | |
model_ref, | |
args=training_args, | |
train_dataset=train_dataset, | |
processing_class=tokenizer, | |
) | |
``` | |
After this one can then call: | |
```py | |
bco_trainer.train() | |
``` | |
## Underlying Distribution matching (UDM) | |
In practical scenarios, the thumbs-up and thumbs-down datasets are likely to have divergent underlying distributions of prompts. | |
Consider an LLM deployed for user feedback: if the model excels in writing tasks but underperforms in coding, the thumbs-up dataset will be dominated by writing-related prompts, while the thumbs-down dataset will contain mostly coding-related prompts. | |
If the prompts in your desired and undesired datasets differ a lot, it is useful to enable UDM. | |
Choose an embedding model and tokenizer: | |
```py | |
embedding_model = AutoModel.from_pretrained(your_model_id) | |
embedding_tokenizer = AutoTokenizer.from_pretrained(your_model_id) | |
# customize this function depending on your embedding model | |
def embed_prompt(input_ids, attention_mask, model): | |
outputs = model(input_ids=input_ids, attention_mask=attention_mask) | |
return outputs.last_hidden_state.mean(dim=1) | |
embedding_model = Accelerator().prepare_model(self.embedding_model) | |
embedding_func = partial(embed_prompt, model=embedding_model) | |
``` | |
Set `prompt_sample_size` to define how many prompts are selected to train the UDM classifier and start the training with the provided embedding function: | |
```py | |
training_args = BCOConfig( | |
beta=0.1, | |
prompt_sample_size=512, | |
) | |
bco_trainer = BCOTrainer( | |
model, | |
model_ref, | |
args=training_args, | |
train_dataset=train_dataset, | |
processing_class=tokenizer, | |
embedding_func=embedding_func, | |
embedding_tokenizer=self.embedding_tokenizer, | |
) | |
bco_trainer.train() | |
``` | |
### For Mixture of Experts Models: Enabling the auxiliary loss | |
MOEs are the most efficient if the load is about equally distributed between experts. | |
To ensure that we train MOEs similarly during preference-tuning, it is beneficial to add the auxiliary loss from the load balancer to the final loss. | |
This option is enabled by setting `output_router_logits=True` in the model config (e.g. MixtralConfig). | |
To scale how much the auxiliary loss contributes to the total loss, use the hyperparameter `router_aux_loss_coef=...` (default: 0.001). | |
## BCOTrainer | |
[[autodoc]] BCOTrainer | |
## BCOConfig | |
[[autodoc]] BCOConfig | |