InternVL3-2B

This version of InternVL3-2B has been converted to run on the Axera NPU using w8a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 3.4

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/OpenGVLab/InternVL3-2B

How to Convert LLM from Huggingface to axmodel

AXera NPU HOST LLM Runtime

AXera NPU AXCL LLM Runtime

Support Platform

chips Image num image encoder 448 ttft w8a16
AX650N 0 0 ms 221 ms (128 tokens) 10 tokens/sec
AX650N 1 364 ms 862 ms (384 tokens) 10 tokens/sec
AX650N 4 1456 ms 4589 ms (1152 tokens) 10 tokens/sec
AX650N 8 2912 ms 13904 ms (2176 tokens) 10 tokens/sec

How to use

Download all files from this repository to the device.

(base) axera@raspberrypi:~/qtang/huggingface/AXERA-TECH/InternVL3-2B $ tree -L 1
.
β”œβ”€β”€ config.json
β”œβ”€β”€ examples
β”œβ”€β”€ gradio_demo_c_api.py
β”œβ”€β”€ gradio_demo_python_api.py
β”œβ”€β”€ infer.py
β”œβ”€β”€ infer_video.py
β”œβ”€β”€ internvl3_2b_axmodel
β”œβ”€β”€ internvl3_2b_tokenizer
β”œβ”€β”€ internvl3_tokenizer.py
β”œβ”€β”€ llm.py
β”œβ”€β”€ main_api_ax650
β”œβ”€β”€ main_api_axcl_aarch64
β”œβ”€β”€ main_api_axcl_x86
β”œβ”€β”€ main_ax650
β”œβ”€β”€ main_axcl_aarch64
β”œβ”€β”€ main_axcl_x86
β”œβ”€β”€ post_config.json
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ run_internvl_3_2b_448_api_ax650.sh
β”œβ”€β”€ run_internvl_3_2b_448_api_axcl_aarch64.sh
β”œβ”€β”€ run_internvl_3_2b_448_api_axcl_x86.sh
β”œβ”€β”€ run_internvl_3_2b_448_ax650.sh
β”œβ”€β”€ run_internvl_3_2b_448_axcl_aarch64.sh
β”œβ”€β”€ run_internvl_3_2b_448_axcl_x86.sh
└── vit_axmodel

6 directories, 22 files

python env requirement

pyaxengine

https://github.com/AXERA-TECH/pyaxengine

wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc1/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl

others

pip install -r requirements.txt

Inference with Raspberry Pi 5 Host using AXCL EP(such as M.2 AI Card or HAT AI Module)

cd InternVL3-2B
python gradio_demo_python_api.py --hf_model internvl3_2b_tokenizer/ \
                                 --axmodel_path internvl3_2b_axmodel/ \
                                 --vit_model vit_axmodel/internvl3_2b_vit_slim.axmodel

[INFO] Available providers:  ['AXCLRTExecutionProvider']
Init InferenceSession:   0%|                                                                                 | 0/28 [00:00<?, ?it/s]
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
Init InferenceSession:   4%|β–ˆβ–ˆβ–ˆβ–                                                                             | 1/28 [00:01<00:43,  1.61s/it]
[INFO] Using provider: AXCLRTExecutionProvider
......
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
Init InferenceSession: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 28/28 [00:34<00:00,  1.23s/it]
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
model load done!
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
  chatbot = gr.Chatbot(height=650)
HTTP ζœεŠ‘εœ°ε€: http://xxx.xxx.xxx.xxx:7860
* Running on local URL:  http://xxx.xxx.xxx.xxx:7860
* To create a public link, set `share=True` in `launch()`.

Access http://xxx.xxx.xxx.xxx:7860 using Chrome or another browser.

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AXERA-TECH/InternVL3-2B

Collection including AXERA-TECH/InternVL3-2B