{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nc0g2NLpUSGr"
},
"source": [
"# Fine-tune SmolVLM2 on Video Captioning\n",
"In this notebook we will fine-tune SmolVLM2-500M-Video-Instruct on Video Feedback dataset. It is ran on a Colab A100 for full fine-tuning, but you can squeeze it to L4 with QLoRA."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "WIhA1lQ7j0kw",
"outputId": "928f2f4e-6cd8-452b-d621-605550fdd33c"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m163.5/163.5 kB\u001b[0m \u001b[31m5.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
]
}
],
"source": [
"%pip install -q accelerate datasets peft bitsandbytes tensorboard pyav num2words"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "FCYgmJtDRElR"
},
"outputs": [],
"source": [
"%pip install -q git+https://github.com/huggingface/transformers.git"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XyJaqZZ3uYYl"
},
"outputs": [],
"source": [
"%pip install -q flash-attn --no-build-isolation"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wAeMA0heVBjT"
},
"source": [
"We will push out model to Hub so we need to authenticate ourselves."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 17,
"referenced_widgets": [
"112da28d935543069e7a1a2abc22f9f4",
"0d22c009aa584ca1a71e32336a7985e0",
"ad17e30049cb4b5aa4046d94690f87d3",
"e77d3520a2d64f9a840652669c9a0ba1",
"1852745b0de44f4281cea0cbb3508459",
"166c19ec6d9f4455a56a0f146d1c0abc",
"f6362bc7b5b24dd592d35a76a1fbf26b",
"e99fbdfc8a22408a8c728a36c8744b24",
"0fee30c9bf2b4bdfad7a37261f92db64",
"4cd8babc92cc4aeba74d2147f28dee7d",
"a4fbf37fe0fe44cfbf72ca1e82af3467",
"be50e04c5629463eb18d029d045f25b3",
"5490c69c251144c4979e346c66ac1e53",
"44d0e1db5f664b3fb7c146c216566776",
"7af918a10ec745d7a3f4a883dbdc8b6a",
"4156b6897089446984196606ef0d3461",
"cf4b5a9cefe84fd9a4d120ab1da6f3f4",
"484155e67e36453c9d1ebd2ea1768eca",
"48bb89c434284b639f45b5929cf8d1a9",
"0ead4ab9bb7648c69352094bfbcb8800"
]
},
"id": "yKd5xtSGj7cm",
"outputId": "a6e841d8-f2d6-44a8-d44d-c0c244d95f9b"
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "112da28d935543069e7a1a2abc22f9f4",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(HTML(value='
Step | \n", "Training Loss | \n", "
---|---|
25 | \n", "3.345600 | \n", "
50 | \n", "0.709500 | \n", "
75 | \n", "0.341000 | \n", "
100 | \n", "0.272200 | \n", "
125 | \n", "0.250600 | \n", "
150 | \n", "0.290400 | \n", "
175 | \n", "0.261100 | \n", "
200 | \n", "0.258000 | \n", "
225 | \n", "0.276500 | \n", "
250 | \n", "0.265900 | \n", "
275 | \n", "0.301500 | \n", "
300 | \n", "0.277900 | \n", "
325 | \n", "0.282800 | \n", "
350 | \n", "0.264100 | \n", "
375 | \n", "0.235500 | \n", "
400 | \n", "0.251400 | \n", "
425 | \n", "0.242500 | \n", "
450 | \n", "0.281100 | \n", "
475 | \n", "0.261000 | \n", "
500 | \n", "0.231800 | \n", "
525 | \n", "0.232200 | \n", "
550 | \n", "0.268100 | \n", "
575 | \n", "0.222400 | \n", "
600 | \n", "0.246600 | \n", "
625 | \n", "0.251700 | \n", "
650 | \n", "0.257800 | \n", "
675 | \n", "0.241000 | \n", "
700 | \n", "0.229000 | \n", "
725 | \n", "0.236600 | \n", "
750 | \n", "0.220900 | \n", "
775 | \n", "0.271400 | \n", "
800 | \n", "0.259900 | \n", "
825 | \n", "0.243900 | \n", "
850 | \n", "0.236400 | \n", "
875 | \n", "0.227200 | \n", "
900 | \n", "0.227900 | \n", "
925 | \n", "0.263300 | \n", "
950 | \n", "0.255200 | \n", "
975 | \n", "0.250000 | \n", "
1000 | \n", "0.244400 | \n", "
"
],
"text/plain": [
"
Copy a token from your Hugging Face\ntokens page and paste it below.
Immediately click login after copying\nyour token or it might be stored in plain text in this notebook file.