Spaces:
Paused
Paused
| # BCO Trainer | |
| [](https://huggingface.co/models?other=bco,trl) | |
| TRL supports the Binary Classifier Optimization (BCO). | |
| The [BCO](https://huggingface.co/papers/2404.04656) authors train a binary classifier whose logit serves as a reward so that the classifier maps {prompt, chosen completion} pairs to 1 and {prompt, rejected completion} pairs to 0. | |
| For a full example have a look at [`examples/scripts/bco.py`]. | |
| ## Expected dataset type | |
| The [`BCOTrainer`] requires an [unpaired preference dataset](dataset_formats#unpaired-preference). | |
| The [`BCOTrainer`] supports both [conversational](dataset_formats#conversational) and [standard](dataset_formats#standard) dataset format. When provided with a conversational dataset, the trainer will automatically apply the chat template to the dataset. | |
| ## Expected model format | |
| The BCO trainer expects a model of `AutoModelForCausalLM`, compared to PPO that expects `AutoModelForCausalLMWithValueHead` for the value function. | |
| ## Using the `BCOTrainer` | |
| For a detailed example have a look at the `examples/scripts/bco.py` script. At a high level we need to initialize the `BCOTrainer` with a `model` we wish to train and a reference `ref_model` which we will use to calculate the implicit rewards of the preferred and rejected response. | |
| The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder). | |
| ```py | |
| training_args = BCOConfig( | |
| beta=0.1, | |
| ) | |
| bco_trainer = BCOTrainer( | |
| model, | |
| model_ref, | |
| args=training_args, | |
| train_dataset=train_dataset, | |
| processing_class=tokenizer, | |
| ) | |
| ``` | |
| After this one can then call: | |
| ```py | |
| bco_trainer.train() | |
| ``` | |
| ## Underlying Distribution matching (UDM) | |
| In practical scenarios, the thumbs-up and thumbs-down datasets are likely to have divergent underlying distributions of prompts. | |
| Consider an LLM deployed for user feedback: if the model excels in writing tasks but underperforms in coding, the thumbs-up dataset will be dominated by writing-related prompts, while the thumbs-down dataset will contain mostly coding-related prompts. | |
| If the prompts in your desired and undesired datasets differ a lot, it is useful to enable UDM. | |
| Choose an embedding model and tokenizer: | |
| ```py | |
| embedding_model = AutoModel.from_pretrained(your_model_id) | |
| embedding_tokenizer = AutoTokenizer.from_pretrained(your_model_id) | |
| # customize this function depending on your embedding model | |
| def embed_prompt(input_ids, attention_mask, model): | |
| outputs = model(input_ids=input_ids, attention_mask=attention_mask) | |
| return outputs.last_hidden_state.mean(dim=1) | |
| embedding_model = Accelerator().prepare_model(self.embedding_model) | |
| embedding_func = partial(embed_prompt, model=embedding_model) | |
| ``` | |
| Set `prompt_sample_size` to define how many prompts are selected to train the UDM classifier and start the training with the provided embedding function: | |
| ```py | |
| training_args = BCOConfig( | |
| beta=0.1, | |
| prompt_sample_size=512, | |
| ) | |
| bco_trainer = BCOTrainer( | |
| model, | |
| model_ref, | |
| args=training_args, | |
| train_dataset=train_dataset, | |
| processing_class=tokenizer, | |
| embedding_func=embedding_func, | |
| embedding_tokenizer=self.embedding_tokenizer, | |
| ) | |
| bco_trainer.train() | |
| ``` | |
| ### For Mixture of Experts Models: Enabling the auxiliary loss | |
| MOEs are the most efficient if the load is about equally distributed between experts. | |
| To ensure that we train MOEs similarly during preference-tuning, it is beneficial to add the auxiliary loss from the load balancer to the final loss. | |
| This option is enabled by setting `output_router_logits=True` in the model config (e.g. MixtralConfig). | |
| To scale how much the auxiliary loss contributes to the total loss, use the hyperparameter `router_aux_loss_coef=...` (default: 0.001). | |
| ## BCOTrainer | |
| [[autodoc]] BCOTrainer | |
| ## BCOConfig | |
| [[autodoc]] BCOConfig | |