Yuantao Feng commited on
Commit
de9c40f
·
1 Parent(s): 43c47ef

Add hardware: Khadas VIM3 & update benchmarks (#39)

Browse files

* add backend TIMVX & target NPU

* update benchmarking results on Khadas VIM3 NPU

* fix wrong column header

* update readme

* re-order column KV3-NPU

* re-order KV3-NPU specs line

* add additional description regarding TIM-VX backend and NPU target for OpenCV DNN

Files changed (2) hide show
  1. README.md +14 -13
  2. benchmark/benchmark.py +5 -4
README.md CHANGED
@@ -14,24 +14,25 @@ Guidelines:
14
 
15
  ## Models & Benchmark Results
16
 
17
- | Model | Input Size | INTEL-CPU (ms) | RPI-CPU (ms) | JETSON-GPU (ms) | D1-CPU (ms) |
18
- |-------|------------|-----------|---------|------------|--------|
19
- | [YuNet](./models/face_detection_yunet) | 160x120 | 1.45 | 6.22 | 12.18 | 86.69 |
20
- | [SFace](./models/face_recognition_sface) | 112x112 | 8.65 | 99.20 | 24.88 | --- |
21
- | [DB-IC15](./models/text_detection_db) | 640x480 | 142.91 | 2835.91 | 208.41 | --- |
22
- | [DB-TD500](./models/text_detection_db) | 640x480 | 142.91 | 2841.71 | 210.51 | --- |
23
- | [CRNN-EN](./models/text_recognition_crnn) | 100x32 | 50.21 | 234.32 | 196.15 | --- |
24
- | [CRNN-CN](./models/text_recognition_crnn) | 100x32 | 73.52 | 322.16 | 239.76 | --- |
25
- | [PP-ResNet](./models/image_classification_ppresnet) | 224x224 | 56.05 | 602.58 | 98.64 | --- |
26
- | [PP-HumanSeg](./models/human_segmentation_pphumanseg) | 192x192 | 19.92 | 105.32 | 67.97 | --- |
27
- | [WeChatQRCode](./models/qrcode_wechatqrcode) | 100x100 | 7.04 | 37.68 | --- | --- |
28
- | [DaSiamRPN](./models/object_tracking_dasiamrpn) | 1280x720 | 36.15 | 705.48 | 76.82 | --- |
29
- | [YoutuReID](./models/person_reid_youtureid) | 128x256 | 35.81 | 521.98 | 90.07 | --- |
30
 
31
  Hardware Setup:
32
  - `INTEL-CPU`: [Intel Core i7-5930K](https://www.intel.com/content/www/us/en/products/sku/82931/intel-core-i75930k-processor-15m-cache-up-to-3-70-ghz/specifications.html) @ 3.50GHz, 6 cores, 12 threads.
33
  - `RPI-CPU`: [Raspberry Pi 4B](https://www.raspberrypi.com/products/raspberry-pi-4-model-b/specifications/), Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz.
34
  - `JETSON-GPU`: [NVIDIA Jetson Nano B01](https://developer.nvidia.com/embedded/jetson-nano-developer-kit), 128-core NVIDIA Maxwell GPU.
 
35
  - `D1-CPU`: [Allwinner D1](https://d1.docs.aw-ol.com/en), [Xuantie C906 CPU](https://www.t-head.cn/product/C906?spm=a2ouz.12986968.0.0.7bfc1384auGNPZ) (RISC-V, RVV 0.7.1) @ 1.0GHz, 1 core. YuNet is supported for now. Visit [here](https://github.com/fengyuentau/opencv_zoo_cpp) for more details.
36
 
37
  ***Important Notes***:
 
14
 
15
  ## Models & Benchmark Results
16
 
17
+ | Model | Input Size | INTEL-CPU (ms) | RPI-CPU (ms) | JETSON-GPU (ms) | KV3-NPU (ms) | D1-CPU (ms) |
18
+ |-------|------------|----------------|--------------|-----------------|--------------|-------------|
19
+ | [YuNet](./models/face_detection_yunet) | 160x120 | 1.45 | 6.22 | 12.18 | 4.04 | 86.69 |
20
+ | [SFace](./models/face_recognition_sface) | 112x112 | 8.65 | 99.20 | 24.88 | 46.25 | --- |
21
+ | [DB-IC15](./models/text_detection_db) | 640x480 | 142.91 | 2835.91 | 208.41 | --- | --- |
22
+ | [DB-TD500](./models/text_detection_db) | 640x480 | 142.91 | 2841.71 | 210.51 | --- | --- |
23
+ | [CRNN-EN](./models/text_recognition_crnn) | 100x32 | 50.21 | 234.32 | 196.15 | 125.30 | --- |
24
+ | [CRNN-CN](./models/text_recognition_crnn) | 100x32 | 73.52 | 322.16 | 239.76 | 166.79 | --- |
25
+ | [PP-ResNet](./models/image_classification_ppresnet) | 224x224 | 56.05 | 602.58 | 98.64 | 75.45 | --- |
26
+ | [PP-HumanSeg](./models/human_segmentation_pphumanseg) | 192x192 | 19.92 | 105.32 | 67.97 | 74.77 | --- |
27
+ | [WeChatQRCode](./models/qrcode_wechatqrcode) | 100x100 | 7.04 | 37.68 | --- | --- | --- |
28
+ | [DaSiamRPN](./models/object_tracking_dasiamrpn) | 1280x720 | 36.15 | 705.48 | 76.82 | --- | --- |
29
+ | [YoutuReID](./models/person_reid_youtureid) | 128x256 | 35.81 | 521.98 | 90.07 | 44.61 | --- |
30
 
31
  Hardware Setup:
32
  - `INTEL-CPU`: [Intel Core i7-5930K](https://www.intel.com/content/www/us/en/products/sku/82931/intel-core-i75930k-processor-15m-cache-up-to-3-70-ghz/specifications.html) @ 3.50GHz, 6 cores, 12 threads.
33
  - `RPI-CPU`: [Raspberry Pi 4B](https://www.raspberrypi.com/products/raspberry-pi-4-model-b/specifications/), Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz.
34
  - `JETSON-GPU`: [NVIDIA Jetson Nano B01](https://developer.nvidia.com/embedded/jetson-nano-developer-kit), 128-core NVIDIA Maxwell GPU.
35
+ - `KV3-NPU`: [Khadas VIM3](https://www.khadas.com/vim3), 5TOPS Performance. Benchmarks are done using **quantized** models. [TIM-VX backend and NPU target support for OpenCV](https://github.com/opencv/opencv/pull/21036) is under reivew. You will need to compile OpenCV with TIM-VX following [this guide](https://gist.github.com/zihaomu/f040be4901d92e423f227c10dfa37650) to run benchmarks.
36
  - `D1-CPU`: [Allwinner D1](https://d1.docs.aw-ol.com/en), [Xuantie C906 CPU](https://www.t-head.cn/product/C906?spm=a2ouz.12986968.0.0.7bfc1384auGNPZ) (RISC-V, RVV 0.7.1) @ 1.0GHz, 1 core. YuNet is supported for now. Visit [here](https://github.com/fengyuentau/opencv_zoo_cpp) for more details.
37
 
38
  ***Important Notes***:
benchmark/benchmark.py CHANGED
@@ -5,7 +5,6 @@ import yaml
5
  import numpy as np
6
  import cv2 as cv
7
 
8
- # from ..models import MODELS
9
  from models import MODELS
10
  from utils import METRICS, DATALOADERS
11
 
@@ -61,7 +60,8 @@ class Benchmark:
61
  # inference_engine=cv.dnn.DNN_BACKEND_INFERENCE_ENGINE,
62
  opencv=cv.dnn.DNN_BACKEND_OPENCV,
63
  # vkcom=cv.dnn.DNN_BACKEND_VKCOM,
64
- cuda=cv.dnn.DNN_BACKEND_CUDA
 
65
  )
66
  self._backend = available_backends[backend_id]
67
 
@@ -75,7 +75,8 @@ class Benchmark:
75
  # fpga=cv.dnn.DNN_TARGET_FPGA,
76
  cuda=cv.dnn.DNN_TARGET_CUDA,
77
  cuda_fp16=cv.dnn.DNN_TARGET_CUDA_FP16,
78
- # hddl=cv.dnn.DNN_TARGET_HDDL
 
79
  )
80
  self._target = available_targets[target_id]
81
 
@@ -120,4 +121,4 @@ if __name__ == '__main__':
120
  # Run benchmarking
121
  print('Benchmarking {}:'.format(model.name))
122
  benchmark.run(model)
123
- benchmark.printResults()
 
5
  import numpy as np
6
  import cv2 as cv
7
 
 
8
  from models import MODELS
9
  from utils import METRICS, DATALOADERS
10
 
 
60
  # inference_engine=cv.dnn.DNN_BACKEND_INFERENCE_ENGINE,
61
  opencv=cv.dnn.DNN_BACKEND_OPENCV,
62
  # vkcom=cv.dnn.DNN_BACKEND_VKCOM,
63
+ cuda=cv.dnn.DNN_BACKEND_CUDA,
64
+ timvx=cv.dnn.DNN_BACKEND_TIMVX
65
  )
66
  self._backend = available_backends[backend_id]
67
 
 
75
  # fpga=cv.dnn.DNN_TARGET_FPGA,
76
  cuda=cv.dnn.DNN_TARGET_CUDA,
77
  cuda_fp16=cv.dnn.DNN_TARGET_CUDA_FP16,
78
+ # hddl=cv.dnn.DNN_TARGET_HDDL,
79
+ npu=cv.dnn.DNN_TARGET_NPU
80
  )
81
  self._target = available_targets[target_id]
82
 
 
121
  # Run benchmarking
122
  print('Benchmarking {}:'.format(model.name))
123
  benchmark.run(model)
124
+ benchmark.printResults()