Instructions to use Teera/Llama-3.2v-COT-Thai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Teera/Llama-3.2v-COT-Thai with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Teera/Llama-3.2v-COT-Thai", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use Teera/Llama-3.2v-COT-Thai with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Teera/Llama-3.2v-COT-Thai to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Teera/Llama-3.2v-COT-Thai to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Teera/Llama-3.2v-COT-Thai to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Teera/Llama-3.2v-COT-Thai", max_seq_length=2048, )
Model Card for Model ID
Teera/Llama-3.2v-COT-Thai is a fine-tuned model based on Llama-3.2V-11B-co, developed with inspiration from the LLaVA-CoT framework.
The concept was introduced in LLaVA-CoT: Let Vision Language Models Reason Step-by-Step.
Training Details
Training Data
The model is trained on the LLaVA-CoT-100k dataset, which has been preprocessed and translated into the Thai language.
Training Procedure
The model is finetuned on llama-recipes with the following settings. Using the same setting should accurately reproduce our results.
| Parameter | Value |
|---|---|
| FSDP | enabled |
| lr | 1e-4 |
| num_epochs | 1 |
| batch_size_training | 2 |
| use_fast_kernels | True |
| run_validation | False |
| batching_strategy | padding |
| context_length | 4096 |
| gradient_accumulation_steps | 1 |
| gradient_clipping | False |
| gradient_clipping_threshold | 1.0 |
| weight_decay | 0.0 |
| gamma | 0.85 |
| seed | 42 |
| use_fp16 | False |
| mixed_precision | True |
Bias, Risks, and Limitations
The model may generate biased or offensive content, similar to other VLMs, due to limitations in the training data. Technically, the model's performance in aspects like instruction following still falls short of leading industry models.
Model tree for Teera/Llama-3.2v-COT-Thai
Base model
meta-llama/Llama-3.2-11B-Vision-Instruct