| ### ScienceQA | |
| #### Prepare Data | |
| 1. Please see ScienceQA [repo](https://github.com/lupantech/ScienceQA) for setting up the dataset. | |
| 2. Generate ScienceQA dataset for LLaVA conversation-style format. | |
| ```Shell | |
| python scripts/convert_sqa_to_llava.py \ | |
| convert_to_llava \ | |
| --base-dir /path/to/ScienceQA/data/scienceqa \ | |
| --prompt-format "QCM-LEA" \ | |
| --split {train,val,minival,test,minitest} | |
| ``` | |
| #### Training | |
| 1. Pretraining | |
| You can download our pretrained projector weights from our [Model Zoo](), or train your own projector weights using [`pretrain.sh`](https://github.com/haotian-liu/LLaVA/blob/main/scripts/pretrain.sh). | |
| 2. Finetuning | |
| See [`finetune_sqa.sh`](https://github.com/haotian-liu/LLaVA/blob/main/scripts/finetune_sqa.sh). | |
| #### Evaluation | |
| 1. Multiple-GPU inference | |
| You may evaluate this with multiple GPUs, and concatenate the generated jsonl files. Please refer to our script for [batch evaluation](https://github.com/haotian-liu/LLaVA/blob/main/scripts/sqa_eval_batch.sh) and [results gathering](https://github.com/haotian-liu/LLaVA/blob/main/scripts/sqa_eval_gather.sh). | |
| 2. Single-GPU inference | |
| (a) Generate LLaVA responses on ScienceQA dataset | |
| ```Shell | |
| python -m llava.eval.model_vqa_science \ | |
| --model-path liuhaotian/llava-lcs558k-scienceqa-vicuna-13b-v1.3 \ | |
| --question-file /path/to/ScienceQA/data/scienceqa/llava_test_QCM-LEA.json \ | |
| --image-folder /path/to/ScienceQA/data/scienceqa/images/test \ | |
| --answers-file vqa/results/ScienceQA/test_llava-13b.jsonl \ | |
| --conv-mode llava_v1 | |
| ``` | |
| (b) Evaluate the generated responses | |
| ```Shell | |
| python eval_science_qa.py \ | |
| --base-dir /path/to/ScienceQA/data/scienceqa \ | |
| --result-file vqa/results/ScienceQA/test_llava-13b.jsonl \ | |
| --output-file vqa/results/ScienceQA/test_llava-13b_output.json \ | |
| --output-result vqa/results/ScienceQA/test_llava-13b_result.json \ | |
| ``` | |
| For reference, we attach our prediction file [`test_sqa_llava_lcs_558k_sqa_12e_vicuna_v1_3_13b.json`](https://github.com/haotian-liu/LLaVA/blob/main/llava/eval/table/results/test_sqa_llava_lcs_558k_sqa_12e_vicuna_v1_3_13b.json) and [`test_sqa_llava_13b_v0.json`](https://github.com/haotian-liu/LLaVA/blob/main/llava/eval/table/results/test_sqa_llava_13b_v0.json) for comparison when reproducing our results, as well as for further analysis in detail. | |