# Installing Required Libraries!

Installing required libraries, including trl, transformers, accelerate, peft, datasets, and bitsandbytes.

In [None]:

# Checks if PyTorch is installed and installs it if not.
try:
    import torch
    print("PyTorch is installed!")
except ImportError:
    print("PyTorch is not installed.")
    !pip install -q torch


In [None]:

!pip install -q --upgrade "transformers==4.38.2"
!pip install -q --upgrade "datasets==2.16.1"
!pip install -q --upgrade "accelerate==0.26.1"
!pip install -q --upgrade "evaluate==0.4.1"
!pip install -q --upgrade "bitsandbytes==0.42.0"
!pip install -q --upgrade "trl==0.7.11"
!pip install -q --upgrade "peft==0.8.2"
    

# Load and Prepare the Dataset

The dataset is already formatted in a conversational format, which is supported by [trl](https://huggingface.co/docs/trl/index/), and ready for supervised finetuning.


**Conversational format:**


```python {"messages": [{"role": "system", "content": "You are..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [{"role": "system", "content": "You are..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [{"role": "system", "content": "You are..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
```


In [None]:

from datasets import load_dataset
    
# Load dataset from the hub
dataset = load_dataset("HuggingFaceH4/ultrachat_200k", split="train_sft")
    
dataset = dataset.shuffle(seed=42)
    

# Load **mistralai/Mistral-7B-v0.1** for Finetuning


This process involves two key steps:

1. **LLM Quantization:**
    - We first load the selected large language model (LLM).
    - We then use the `bitsandbytes` library to quantize the model, which can significantly reduce its memory footprint.

> **Note:** The memory requirements of the model scale with its size. For instance, a 7B parameter model may require 
a 24GB GPU for fine-tuning. 

2. **Chat Model Preparation:**
    - To train a model for chat/conversational tasks, we need to prepare both the model and its tokenizer.
    
    - This involves adding special tokens to the tokenizer and the model itself. These tokens help the model 
    understand the different roles within a conversation. 
    
    - The **trl** provides a convenient method called `setup_chat_format` for this purpose. This method performs the 
    following actions: 
    
        * Adds special tokens to the tokenizer, such as `<|im_start|>` and `<|im_end|>`, to mark the beginning and 
        ending of a conversation. 
        
        * Resizes the model's embedding layer to accommodate the new tokens.
        
        * Sets the tokenizer's chat template, which defines the format used to convert input data into a chat-like 
        structure. The default template is `chatml` from OpenAI.




In [None]:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from trl import setup_chat_format

# Hugging Face model id
model_id = "mistralai/Mistral-7B-v0.1"

# BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
    load_in_8bit=True, bnb_4bit_use_double_quant=True, 
    bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 
)

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
    
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config
)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenizer.padding_side = "right"


# Set chat template to OAI chatML
model, tokenizer = setup_chat_format(model, tokenizer)

    

## Setting LoRA Config

The `SFTTrainer` provides native integration with `peft`, simplifying the process of efficiently tuning 
    Language Models (LLMs) using techniques such as [LoRA](
    https://magazine.sebastianraschka.com/p/practical-tips-for-finetuning-llms). The only requirement is to create 
    the `LoraConfig` and pass it to the `SFTTrainer`. 
    

In [None]:

from peft import LoraConfig

peft_config = LoraConfig(
    lora_alpha=8,
    lora_dropout=0.05,
    r=6,
    bias="none",
    target_modules="all-linear",
    task_type="CAUSAL_LM"
)
    

## Setting the TrainingArguments

In [None]:

# Installing tensorboard to report the metrics
!pip install -q tensorboard
    

In [None]:

from transformers import TrainingArguments

args = TrainingArguments(
    output_dir="temp_/LChat-7b",
    num_train_epochs=100,
    per_device_train_batch_size=3,
    gradient_accumulation_steps=2,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={'use_reentrant': False},
    optim="adamw_torch_fused",
    logging_steps=10,
    save_strategy='epoch',
    learning_rate=0.075,
    bf16=True,
    max_grad_norm=0.3,
    warmup_ratio=0.1,
    lr_scheduler_type='cosine',
    report_to='tensorboard', 
    max_steps=-1,
    seed=42,
    overwrite_output_dir=True,
    remove_unused_columns=True
)
    

## Setting the Supervised Finetuning Trainer (`SFTTrainer`)
    
This `SFTTrainer` is a wrapper around the `transformers.Trainer` class and inherits all of its attributes and methods.
The trainer takes care of properly initializing the `PeftModel`.   
    

In [None]:

from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=dataset,
    peft_config=peft_config,
    max_seq_length=2048,
    tokenizer=tokenizer,
    packing=True,
    dataset_kwargs={'add_special_tokens': False, 'append_concat_token': False}
)


### Starting Training and Saving Model/Tokenizer

We start training the model by calling the `train()` method on the trainer instance. This will start the training 
loop and train the model for `100 epochs`. The model will be automatically saved to the output directory (**'temp_/LChat-7b'**)
and to the hub in **'User//LChat-7b'**. 
  
    

In [None]:


model.config.use_cache = False

# start training
trainer.train()

# save the peft model
trainer.save_model()


### Free the GPU Memory to Prepare Merging `LoRA` Adapters with the Base Model


In [None]:


# Free the GPU memory
del model
del trainer
torch.cuda.empty_cache()


## Merging LoRA Adapters into the Original Model

While utilizing `LoRA`, we focus on training the adapters rather than the entire model. Consequently, during the 
model saving process, only the `adapter weights` are preserved, not the complete model. If we wish to save the 
entire model for easier usage with Text Generation Inference, we can incorporate the adapter weights into the model 
weights. This can be achieved using the `merge_and_unload` method. Following this, the model can be saved using the 
`save_pretrained` method. The result is a default model that is ready for inference.


In [None]:

import torch
from peft import AutoPeftModelForCausalLM

# Load Peft model on CPU
model = AutoPeftModelForCausalLM.from_pretrained(
    "temp_/LChat-7b",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True
)
    
# Merge LoRA with the base model and save
merged_model = model.merge_and_unload()
merged_model.save_pretrained("/LChat-7b", safe_serialization=True, max_shard_size="2GB")
tokenizer.save_pretrained("/LChat-7b")


### Copy all result folders from 'temp_/LChat-7b' to '/LChat-7b'

In [None]:

import os
import shutil

source_folder = "temp_/LChat-7b"
destination_folder = "/LChat-7b"
os.makedirs(destination_folder, exist_ok=True)
for item in os.listdir(source_folder):
    item_path = os.path.join(source_folder, item)
    if os.path.isdir(item_path):
        destination_path = os.path.join(destination_folder, item)
        shutil.copytree(item_path, destination_path)


### Generating a model card (README.md)

In [None]:

card = '''
---
license: apache-2.0
tags:
- generated_from_trainer
- mistralai/Mistral
- PyTorch
- transformers
- trl
- peft
- tensorboard
base_model: mistralai/Mistral-7B-v0.1
widget:
  - example_title: Pirate!
    messages:
      - role: system
        content: You are a pirate chatbot who always responds with Arr!
      - role: user
        content: "There's a llama on my lawn, how can I get rid of him?"
    output:
      text: >-
        Arr! 'Tis a puzzlin' matter, me hearty! A llama on yer lawn be a rare
        sight, but I've got a plan that might help ye get rid of 'im. Ye'll need
        to gather some carrots and hay, and then lure the llama away with the
        promise of a tasty treat. Once he's gone, ye can clean up yer lawn and
        enjoy the peace and quiet once again. But beware, me hearty, for there
        may be more llamas where that one came from! Arr!
model-index:
- name: LChat-7b
  results: []
datasets:
- HuggingFaceH4/ultrachat_200k
language:
- en
pipeline_tag: text-generation
---

# Model Card for LChat-7b:

**LChat-7b** is a language model that is trained to act as helpful assistant. It is a finetuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) that was trained using `SFTTrainer` on publicly available dataset [
HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k).

## Training Procedure:

The training code used to create this model was generated by [Menouar/LLM-FineTuning-Notebook-Generator](https://huggingface.co/spaces/Menouar/LLM-FineTuning-Notebook-Generator).



## Training hyperparameters

The following hyperparameters were used during the training:


'''

with open("/LChat-7b/README.md", "w") as f:
    f.write(card)

args_dict = vars(args)

with open("/LChat-7b/README.md", "a") as f:
    for k, v in args_dict.items():
        f.write(f"- {k}: {v}")
        f.write("\n \n")


## Login to HF

Replace `HF_TOKEN` with a valid token in order to push **'/LChat-7b'** to `huggingface_hub`.

In [None]:

# Install huggingface_hub
!pip install -q huggingface_hub
    
from huggingface_hub import login
    
login(
        token='_gxyairSqRlrHFswgszIHJmObFVaGSDGcEk',
        add_to_git_credential=True
)
    

## Pushing '/LChat-7b' to the Hugging Face account.

In [None]:

from huggingface_hub import HfApi, HfFolder, Repository

# Instantiate the HfApi class
api = HfApi()

# Our Hugging Face repository
repo_name = "LChat-7b"

# Create a repository on the Hugging Face Hub
repo = api.create_repo(token=HfFolder.get_token(), repo_type="model", repo_id=repo_name)

api.upload_folder(
    folder_path="/LChat-7b",
    repo_id=repo.repo_id
)
