| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						library_name: transformers | 
					
					
						
						| 
							 | 
						license: llama3.1 | 
					
					
						
						| 
							 | 
						base_model: meta-llama/Meta-Llama-3.1-8B-Instruct | 
					
					
						
						| 
							 | 
						tags: | 
					
					
						
						| 
							 | 
						- alignment-handbook | 
					
					
						
						| 
							 | 
						- trl | 
					
					
						
						| 
							 | 
						- dpo | 
					
					
						
						| 
							 | 
						- generated_from_trainer | 
					
					
						
						| 
							 | 
						- trl | 
					
					
						
						| 
							 | 
						- dpo | 
					
					
						
						| 
							 | 
						- generated_from_trainer | 
					
					
						
						| 
							 | 
						datasets: | 
					
					
						
						| 
							 | 
						- HuggingFaceH4/ultrafeedback_binarized | 
					
					
						
						| 
							 | 
						- tanliboy/orca_dpo_pairs | 
					
					
						
						| 
							 | 
						model-index: | 
					
					
						
						| 
							 | 
						- name: lambda-llama-3-8b-ipo-test | 
					
					
						
						| 
							 | 
						  results: [] | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						<!-- This model card has been generated automatically according to the information the Trainer had access to. You | 
					
					
						
						| 
							 | 
						should probably proofread and complete it, then remove this comment. --> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						# lambda-llama-3-8b-ipo-test | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the HuggingFaceH4/ultrafeedback_binarized and the tanliboy/orca_dpo_pairs datasets. | 
					
					
						
						| 
							 | 
						It achieves the following results on the evaluation set: | 
					
					
						
						| 
							 | 
						- Loss: 0.8931 | 
					
					
						
						| 
							 | 
						- Rewards/chosen: -0.3610 | 
					
					
						
						| 
							 | 
						- Rewards/rejected: -0.5883 | 
					
					
						
						| 
							 | 
						- Rewards/accuracies: 0.7922 | 
					
					
						
						| 
							 | 
						- Rewards/margins: 0.2272 | 
					
					
						
						| 
							 | 
						- Logps/rejected: -3.1373 | 
					
					
						
						| 
							 | 
						- Logps/chosen: -2.5334 | 
					
					
						
						| 
							 | 
						- Logits/rejected: -2.9939 | 
					
					
						
						| 
							 | 
						- Logits/chosen: -2.9244 | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						## Model description | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						More information needed | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						## Intended uses & limitations | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						More information needed | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						## Training and evaluation data | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						More information needed | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						## Training procedure | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						### Training hyperparameters | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						The following hyperparameters were used during training: | 
					
					
						
						| 
							 | 
						- learning_rate: 5e-07 | 
					
					
						
						| 
							 | 
						- train_batch_size: 4 | 
					
					
						
						| 
							 | 
						- eval_batch_size: 4 | 
					
					
						
						| 
							 | 
						- seed: 42 | 
					
					
						
						| 
							 | 
						- distributed_type: multi-GPU | 
					
					
						
						| 
							 | 
						- num_devices: 8 | 
					
					
						
						| 
							 | 
						- gradient_accumulation_steps: 4 | 
					
					
						
						| 
							 | 
						- total_train_batch_size: 128 | 
					
					
						
						| 
							 | 
						- total_eval_batch_size: 32 | 
					
					
						
						| 
							 | 
						- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 | 
					
					
						
						| 
							 | 
						- lr_scheduler_type: cosine | 
					
					
						
						| 
							 | 
						- lr_scheduler_warmup_ratio: 0.1 | 
					
					
						
						| 
							 | 
						- num_epochs: 1 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						### Training results | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | | 
					
					
						
						| 
							 | 
						|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 
					
					
						
						| 
							 | 
						| 1.1749        | 0.1744 | 100  | 1.0763          | -0.1732        | -0.3120          | 0.7892             | 0.1388          | -2.4465        | -2.0638      | -2.5676         | -2.5133       | | 
					
					
						
						| 
							 | 
						| 0.9802        | 0.3489 | 200  | 0.9501          | -0.3184        | -0.5302          | 0.8012             | 0.2118          | -2.9922        | -2.4269      | -2.7873         | -2.7230       | | 
					
					
						
						| 
							 | 
						| 0.9548        | 0.5233 | 300  | 0.9136          | -0.3761        | -0.6028          | 0.8163             | 0.2267          | -3.1736        | -2.5710      | -2.8788         | -2.8087       | | 
					
					
						
						| 
							 | 
						| 0.9834        | 0.6978 | 400  | 0.9041          | -0.3384        | -0.5537          | 0.8042             | 0.2153          | -3.0509        | -2.4770      | -2.9371         | -2.8667       | | 
					
					
						
						| 
							 | 
						| 0.9967        | 0.8722 | 500  | 0.8938          | -0.3750        | -0.6076          | 0.7892             | 0.2326          | -3.1855        | -2.5684      | -3.0293         | -2.9592       | | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						### Framework versions | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						- Transformers 4.44.2 | 
					
					
						
						| 
							 | 
						- Pytorch 2.4.0+cu121 | 
					
					
						
						| 
							 | 
						- Datasets 2.19.1 | 
					
					
						
						| 
							 | 
						- Tokenizers 0.19.1 | 
					
					
						
						| 
							 | 
						
 |