Multimodal Models
Collection
14 items
β’
Updated
This version of InternVL3-2B has been converted to run on the Axera NPU using w8a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 3.4
For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/OpenGVLab/InternVL3-2B
How to Convert LLM from Huggingface to axmodel
chips | Image num | image encoder 448 | ttft | w8a16 |
---|---|---|---|---|
AX650N | 0 | 0 ms | 221 ms (128 tokens) | 10 tokens/sec |
AX650N | 1 | 364 ms | 862 ms (384 tokens) | 10 tokens/sec |
AX650N | 4 | 1456 ms | 4589 ms (1152 tokens) | 10 tokens/sec |
AX650N | 8 | 2912 ms | 13904 ms (2176 tokens) | 10 tokens/sec |
Download all files from this repository to the device.
(base) axera@raspberrypi:~/qtang/huggingface/AXERA-TECH/InternVL3-2B $ tree -L 1
.
βββ config.json
βββ examples
βββ gradio_demo_c_api.py
βββ gradio_demo_python_api.py
βββ infer.py
βββ infer_video.py
βββ internvl3_2b_axmodel
βββ internvl3_2b_tokenizer
βββ internvl3_tokenizer.py
βββ llm.py
βββ main_api_ax650
βββ main_api_axcl_aarch64
βββ main_api_axcl_x86
βββ main_ax650
βββ main_axcl_aarch64
βββ main_axcl_x86
βββ post_config.json
βββ README.md
βββ requirements.txt
βββ run_internvl_3_2b_448_api_ax650.sh
βββ run_internvl_3_2b_448_api_axcl_aarch64.sh
βββ run_internvl_3_2b_448_api_axcl_x86.sh
βββ run_internvl_3_2b_448_ax650.sh
βββ run_internvl_3_2b_448_axcl_aarch64.sh
βββ run_internvl_3_2b_448_axcl_x86.sh
βββ vit_axmodel
6 directories, 22 files
https://github.com/AXERA-TECH/pyaxengine
wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc1/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl
pip install -r requirements.txt
cd InternVL3-2B
python gradio_demo_python_api.py --hf_model internvl3_2b_tokenizer/ \
--axmodel_path internvl3_2b_axmodel/ \
--vit_model vit_axmodel/internvl3_2b_vit_slim.axmodel
[INFO] Available providers: ['AXCLRTExecutionProvider']
Init InferenceSession: 0%| | 0/28 [00:00<?, ?it/s]
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
Init InferenceSession: 4%|ββββ | 1/28 [00:01<00:43, 1.61s/it]
[INFO] Using provider: AXCLRTExecutionProvider
......
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
Init InferenceSession: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 28/28 [00:34<00:00, 1.23s/it]
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
model load done!
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
chatbot = gr.Chatbot(height=650)
HTTP ζε‘ε°ε: http://xxx.xxx.xxx.xxx:7860
* Running on local URL: http://xxx.xxx.xxx.xxx:7860
* To create a public link, set `share=True` in `launch()`.
Access http://xxx.xxx.xxx.xxx:7860
using Chrome or another browser.
Base model
OpenGVLab/InternVL3-2B-Pretrained