Spaces:

ivangabriele
/

trl-sandbox

Paused

App Files Files Community

trl-sandbox / docs /source /bco_trainer.md

ivangabriele

feat: initialize project

2f5127c verified 5 months ago

preview code

raw

history blame

4 kB

	# BCO Trainer

	[![](https://img.shields.io/badge/All_models-BCO-blue)](https://huggingface.co/models?other=bco,trl)

	TRL supports the Binary Classifier Optimization (BCO).
	The [BCO](https://huggingface.co/papers/2404.04656) authors train a binary classifier whose logit serves as a reward so that the classifier maps {prompt, chosen completion} pairs to 1 and {prompt, rejected completion} pairs to 0.
	For a full example have a look at [`examples/scripts/bco.py`].

	## Expected dataset type

	The [`BCOTrainer`] requires an [unpaired preference dataset](dataset_formats#unpaired-preference).
	The [`BCOTrainer`] supports both [conversational](dataset_formats#conversational) and [standard](dataset_formats#standard) dataset format. When provided with a conversational dataset, the trainer will automatically apply the chat template to the dataset.

	## Expected model format
	The BCO trainer expects a model of `AutoModelForCausalLM`, compared to PPO that expects `AutoModelForCausalLMWithValueHead` for the value function.

	## Using the `BCOTrainer`

	For a detailed example have a look at the `examples/scripts/bco.py` script. At a high level we need to initialize the `BCOTrainer` with a `model` we wish to train and a reference `ref_model` which we will use to calculate the implicit rewards of the preferred and rejected response.

	The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder).



	```py
	training_args = BCOConfig(
	beta=0.1,
	)

	bco_trainer = BCOTrainer(
	model,
	model_ref,
	args=training_args,
	train_dataset=train_dataset,
	processing_class=tokenizer,
	)
	```
	After this one can then call:

	```py
	bco_trainer.train()
	```

	## Underlying Distribution matching (UDM)

	In practical scenarios, the thumbs-up and thumbs-down datasets are likely to have divergent underlying distributions of prompts.
	Consider an LLM deployed for user feedback: if the model excels in writing tasks but underperforms in coding, the thumbs-up dataset will be dominated by writing-related prompts, while the thumbs-down dataset will contain mostly coding-related prompts.
	If the prompts in your desired and undesired datasets differ a lot, it is useful to enable UDM.

	Choose an embedding model and tokenizer:

	```py
	embedding_model = AutoModel.from_pretrained(your_model_id)
	embedding_tokenizer = AutoTokenizer.from_pretrained(your_model_id)

	# customize this function depending on your embedding model
	def embed_prompt(input_ids, attention_mask, model):
	outputs = model(input_ids=input_ids, attention_mask=attention_mask)
	return outputs.last_hidden_state.mean(dim=1)

	embedding_model = Accelerator().prepare_model(self.embedding_model)
	embedding_func = partial(embed_prompt, model=embedding_model)
	```

	Set `prompt_sample_size` to define how many prompts are selected to train the UDM classifier and start the training with the provided embedding function:

	```py
	training_args = BCOConfig(
	beta=0.1,
	prompt_sample_size=512,
	)

	bco_trainer = BCOTrainer(
	model,
	model_ref,
	args=training_args,
	train_dataset=train_dataset,
	processing_class=tokenizer,
	embedding_func=embedding_func,
	embedding_tokenizer=self.embedding_tokenizer,
	)

	bco_trainer.train()
	```

	### For Mixture of Experts Models: Enabling the auxiliary loss

	MOEs are the most efficient if the load is about equally distributed between experts.
	To ensure that we train MOEs similarly during preference-tuning, it is beneficial to add the auxiliary loss from the load balancer to the final loss.

	This option is enabled by setting `output_router_logits=True` in the model config (e.g. MixtralConfig).
	To scale how much the auxiliary loss contributes to the total loss, use the hyperparameter `router_aux_loss_coef=...` (default: 0.001).

	## BCOTrainer

	[[autodoc]] BCOTrainer

	## BCOConfig

	[[autodoc]] BCOConfig