Model Card for Model ID

Teera/Llama-3.2v-COT-Thai is a fine-tuned model based on Llama-3.2V-11B-co, developed with inspiration from the LLaVA-CoT framework.

The concept was introduced in LLaVA-CoT: Let Vision Language Models Reason Step-by-Step.

Training Details

Training Data

The model is trained on the LLaVA-CoT-100k dataset, which has been preprocessed and translated into the Thai language.

Training Procedure

The model is finetuned on llama-recipes with the following settings. Using the same setting should accurately reproduce our results.

Parameter	Value
FSDP	enabled
lr	1e-4
num_epochs	1
batch_size_training	2
use_fast_kernels	True
run_validation	False
batching_strategy	padding
context_length	4096
gradient_accumulation_steps	1
gradient_clipping	False
gradient_clipping_threshold	1.0
weight_decay	0.0
gamma	0.85
seed	42
use_fp16	False
mixed_precision	True

Bias, Risks, and Limitations

The model may generate biased or offensive content, similar to other VLMs, due to limitations in the training data. Technically, the model's performance in aspects like instruction following still falls short of leading industry models.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Teera/Llama-3.2v-COT-Thai

Base model

meta-llama/Llama-3.2-11B-Vision-Instruct

Finetuned

Xkev/Llama-3.2V-11B-cot

Finetuned

(8)

this model

Collection including Teera/Llama-3.2v-COT-Thai

Vision Language Model

Collection

VL • 1 item • Updated Jan 20, 2025 • 1