Yuantao Feng
commited on
Commit
·
de9c40f
1
Parent(s):
43c47ef
Add hardware: Khadas VIM3 & update benchmarks (#39)
Browse files* add backend TIMVX & target NPU
* update benchmarking results on Khadas VIM3 NPU
* fix wrong column header
* update readme
* re-order column KV3-NPU
* re-order KV3-NPU specs line
* add additional description regarding TIM-VX backend and NPU target for OpenCV DNN
- README.md +14 -13
- benchmark/benchmark.py +5 -4
README.md
CHANGED
@@ -14,24 +14,25 @@ Guidelines:
|
|
14 |
|
15 |
## Models & Benchmark Results
|
16 |
|
17 |
-
| Model | Input Size | INTEL-CPU (ms) | RPI-CPU (ms) | JETSON-GPU (ms) | D1-CPU (ms) |
|
18 |
-
|
19 |
-
| [YuNet](./models/face_detection_yunet)
|
20 |
-
| [SFace](./models/face_recognition_sface)
|
21 |
-
| [DB-IC15](./models/text_detection_db)
|
22 |
-
| [DB-TD500](./models/text_detection_db)
|
23 |
-
| [CRNN-EN](./models/text_recognition_crnn)
|
24 |
-
| [CRNN-CN](./models/text_recognition_crnn)
|
25 |
-
| [PP-ResNet](./models/image_classification_ppresnet)
|
26 |
-
| [PP-HumanSeg](./models/human_segmentation_pphumanseg) | 192x192
|
27 |
-
| [WeChatQRCode](./models/qrcode_wechatqrcode)
|
28 |
-
| [DaSiamRPN](./models/object_tracking_dasiamrpn)
|
29 |
-
| [YoutuReID](./models/person_reid_youtureid)
|
30 |
|
31 |
Hardware Setup:
|
32 |
- `INTEL-CPU`: [Intel Core i7-5930K](https://www.intel.com/content/www/us/en/products/sku/82931/intel-core-i75930k-processor-15m-cache-up-to-3-70-ghz/specifications.html) @ 3.50GHz, 6 cores, 12 threads.
|
33 |
- `RPI-CPU`: [Raspberry Pi 4B](https://www.raspberrypi.com/products/raspberry-pi-4-model-b/specifications/), Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz.
|
34 |
- `JETSON-GPU`: [NVIDIA Jetson Nano B01](https://developer.nvidia.com/embedded/jetson-nano-developer-kit), 128-core NVIDIA Maxwell GPU.
|
|
|
35 |
- `D1-CPU`: [Allwinner D1](https://d1.docs.aw-ol.com/en), [Xuantie C906 CPU](https://www.t-head.cn/product/C906?spm=a2ouz.12986968.0.0.7bfc1384auGNPZ) (RISC-V, RVV 0.7.1) @ 1.0GHz, 1 core. YuNet is supported for now. Visit [here](https://github.com/fengyuentau/opencv_zoo_cpp) for more details.
|
36 |
|
37 |
***Important Notes***:
|
|
|
14 |
|
15 |
## Models & Benchmark Results
|
16 |
|
17 |
+
| Model | Input Size | INTEL-CPU (ms) | RPI-CPU (ms) | JETSON-GPU (ms) | KV3-NPU (ms) | D1-CPU (ms) |
|
18 |
+
|-------|------------|----------------|--------------|-----------------|--------------|-------------|
|
19 |
+
| [YuNet](./models/face_detection_yunet) | 160x120 | 1.45 | 6.22 | 12.18 | 4.04 | 86.69 |
|
20 |
+
| [SFace](./models/face_recognition_sface) | 112x112 | 8.65 | 99.20 | 24.88 | 46.25 | --- |
|
21 |
+
| [DB-IC15](./models/text_detection_db) | 640x480 | 142.91 | 2835.91 | 208.41 | --- | --- |
|
22 |
+
| [DB-TD500](./models/text_detection_db) | 640x480 | 142.91 | 2841.71 | 210.51 | --- | --- |
|
23 |
+
| [CRNN-EN](./models/text_recognition_crnn) | 100x32 | 50.21 | 234.32 | 196.15 | 125.30 | --- |
|
24 |
+
| [CRNN-CN](./models/text_recognition_crnn) | 100x32 | 73.52 | 322.16 | 239.76 | 166.79 | --- |
|
25 |
+
| [PP-ResNet](./models/image_classification_ppresnet) | 224x224 | 56.05 | 602.58 | 98.64 | 75.45 | --- |
|
26 |
+
| [PP-HumanSeg](./models/human_segmentation_pphumanseg) | 192x192 | 19.92 | 105.32 | 67.97 | 74.77 | --- |
|
27 |
+
| [WeChatQRCode](./models/qrcode_wechatqrcode) | 100x100 | 7.04 | 37.68 | --- | --- | --- |
|
28 |
+
| [DaSiamRPN](./models/object_tracking_dasiamrpn) | 1280x720 | 36.15 | 705.48 | 76.82 | --- | --- |
|
29 |
+
| [YoutuReID](./models/person_reid_youtureid) | 128x256 | 35.81 | 521.98 | 90.07 | 44.61 | --- |
|
30 |
|
31 |
Hardware Setup:
|
32 |
- `INTEL-CPU`: [Intel Core i7-5930K](https://www.intel.com/content/www/us/en/products/sku/82931/intel-core-i75930k-processor-15m-cache-up-to-3-70-ghz/specifications.html) @ 3.50GHz, 6 cores, 12 threads.
|
33 |
- `RPI-CPU`: [Raspberry Pi 4B](https://www.raspberrypi.com/products/raspberry-pi-4-model-b/specifications/), Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz.
|
34 |
- `JETSON-GPU`: [NVIDIA Jetson Nano B01](https://developer.nvidia.com/embedded/jetson-nano-developer-kit), 128-core NVIDIA Maxwell GPU.
|
35 |
+
- `KV3-NPU`: [Khadas VIM3](https://www.khadas.com/vim3), 5TOPS Performance. Benchmarks are done using **quantized** models. [TIM-VX backend and NPU target support for OpenCV](https://github.com/opencv/opencv/pull/21036) is under reivew. You will need to compile OpenCV with TIM-VX following [this guide](https://gist.github.com/zihaomu/f040be4901d92e423f227c10dfa37650) to run benchmarks.
|
36 |
- `D1-CPU`: [Allwinner D1](https://d1.docs.aw-ol.com/en), [Xuantie C906 CPU](https://www.t-head.cn/product/C906?spm=a2ouz.12986968.0.0.7bfc1384auGNPZ) (RISC-V, RVV 0.7.1) @ 1.0GHz, 1 core. YuNet is supported for now. Visit [here](https://github.com/fengyuentau/opencv_zoo_cpp) for more details.
|
37 |
|
38 |
***Important Notes***:
|
benchmark/benchmark.py
CHANGED
@@ -5,7 +5,6 @@ import yaml
|
|
5 |
import numpy as np
|
6 |
import cv2 as cv
|
7 |
|
8 |
-
# from ..models import MODELS
|
9 |
from models import MODELS
|
10 |
from utils import METRICS, DATALOADERS
|
11 |
|
@@ -61,7 +60,8 @@ class Benchmark:
|
|
61 |
# inference_engine=cv.dnn.DNN_BACKEND_INFERENCE_ENGINE,
|
62 |
opencv=cv.dnn.DNN_BACKEND_OPENCV,
|
63 |
# vkcom=cv.dnn.DNN_BACKEND_VKCOM,
|
64 |
-
cuda=cv.dnn.DNN_BACKEND_CUDA
|
|
|
65 |
)
|
66 |
self._backend = available_backends[backend_id]
|
67 |
|
@@ -75,7 +75,8 @@ class Benchmark:
|
|
75 |
# fpga=cv.dnn.DNN_TARGET_FPGA,
|
76 |
cuda=cv.dnn.DNN_TARGET_CUDA,
|
77 |
cuda_fp16=cv.dnn.DNN_TARGET_CUDA_FP16,
|
78 |
-
# hddl=cv.dnn.DNN_TARGET_HDDL
|
|
|
79 |
)
|
80 |
self._target = available_targets[target_id]
|
81 |
|
@@ -120,4 +121,4 @@ if __name__ == '__main__':
|
|
120 |
# Run benchmarking
|
121 |
print('Benchmarking {}:'.format(model.name))
|
122 |
benchmark.run(model)
|
123 |
-
benchmark.printResults()
|
|
|
5 |
import numpy as np
|
6 |
import cv2 as cv
|
7 |
|
|
|
8 |
from models import MODELS
|
9 |
from utils import METRICS, DATALOADERS
|
10 |
|
|
|
60 |
# inference_engine=cv.dnn.DNN_BACKEND_INFERENCE_ENGINE,
|
61 |
opencv=cv.dnn.DNN_BACKEND_OPENCV,
|
62 |
# vkcom=cv.dnn.DNN_BACKEND_VKCOM,
|
63 |
+
cuda=cv.dnn.DNN_BACKEND_CUDA,
|
64 |
+
timvx=cv.dnn.DNN_BACKEND_TIMVX
|
65 |
)
|
66 |
self._backend = available_backends[backend_id]
|
67 |
|
|
|
75 |
# fpga=cv.dnn.DNN_TARGET_FPGA,
|
76 |
cuda=cv.dnn.DNN_TARGET_CUDA,
|
77 |
cuda_fp16=cv.dnn.DNN_TARGET_CUDA_FP16,
|
78 |
+
# hddl=cv.dnn.DNN_TARGET_HDDL,
|
79 |
+
npu=cv.dnn.DNN_TARGET_NPU
|
80 |
)
|
81 |
self._target = available_targets[target_id]
|
82 |
|
|
|
121 |
# Run benchmarking
|
122 |
print('Benchmarking {}:'.format(model.name))
|
123 |
benchmark.run(model)
|
124 |
+
benchmark.printResults()
|