InternVL3-2B

This version of InternVL3-2B has been converted to run on the Axera NPU using w8a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 3.4

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/OpenGVLab/InternVL3-2B

How to Convert LLM from Huggingface to axmodel

AXera NPU HOST LLM Runtime

AXera NPU AXCL LLM Runtime

Support Platform

AX650
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card

chips	Image num	image encoder 448	ttft	w8a16
AX650N	0	0 ms	221 ms (128 tokens)	10 tokens/sec
AX650N	1	364 ms	862 ms (384 tokens)	10 tokens/sec
AX650N	4	1456 ms	4589 ms (1152 tokens)	10 tokens/sec
AX650N	8	2912 ms	13904 ms (2176 tokens)	10 tokens/sec

How to use

Download all files from this repository to the device.

(base) axera@raspberrypi:~/qtang/huggingface/AXERA-TECH/InternVL3-2B $ tree -L 1
.
├── config.json
├── examples
├── gradio_demo_c_api.py
├── gradio_demo_python_api.py
├── infer.py
├── infer_video.py
├── internvl3_2b_axmodel
├── internvl3_2b_tokenizer
├── internvl3_tokenizer.py
├── llm.py
├── main_api_ax650
├── main_api_axcl_aarch64
├── main_api_axcl_x86
├── main_ax650
├── main_axcl_aarch64
├── main_axcl_x86
├── post_config.json
├── README.md
├── requirements.txt
├── run_internvl_3_2b_448_api_ax650.sh
├── run_internvl_3_2b_448_api_axcl_aarch64.sh
├── run_internvl_3_2b_448_api_axcl_x86.sh
├── run_internvl_3_2b_448_ax650.sh
├── run_internvl_3_2b_448_axcl_aarch64.sh
├── run_internvl_3_2b_448_axcl_x86.sh
└── vit_axmodel

6 directories, 22 files

python env requirement

pyaxengine

https://github.com/AXERA-TECH/pyaxengine

wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc1/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl

others

pip install -r requirements.txt

Inference with Raspberry Pi 5 Host using AXCL EP(such as M.2 AI Card or HAT AI Module)

cd InternVL3-2B
python gradio_demo_python_api.py --hf_model internvl3_2b_tokenizer/ \
                                 --axmodel_path internvl3_2b_axmodel/ \
                                 --vit_model vit_axmodel/internvl3_2b_vit_slim.axmodel

[INFO] Available providers:  ['AXCLRTExecutionProvider']
Init InferenceSession:   0%|                                                                                 | 0/28 [00:00<?, ?it/s]
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
Init InferenceSession:   4%|███▏                                                                             | 1/28 [00:01<00:43,  1.61s/it]
[INFO] Using provider: AXCLRTExecutionProvider
......
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
Init InferenceSession: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:34<00:00,  1.23s/it]
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
model load done!
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
  chatbot = gr.Chatbot(height=650)
HTTP 服务地址: http://xxx.xxx.xxx.xxx:7860
* Running on local URL:  http://xxx.xxx.xxx.xxx:7860
* To create a public link, set `share=True` in `launch()`.

Access http://xxx.xxx.xxx.xxx:7860 using Chrome or another browser.

AXERA-TECH
/

InternVL3-2B