Add hardware: Khadas VIM3 & update benchmarks (#39)

* add backend TIMVX & target NPU

* update benchmarking results on Khadas VIM3 NPU

* fix wrong column header

* update readme

* re-order column KV3-NPU

* re-order KV3-NPU specs line

* add additional description regarding TIM-VX backend and NPU target for OpenCV DNN

Files changed (2) hide show

README.md +14 -13
benchmark/benchmark.py +5 -4

README.md CHANGED Viewed

@@ -14,24 +14,25 @@ Guidelines:
 ## Models & Benchmark Results
-| Model | Input Size | INTEL-CPU (ms) | RPI-CPU (ms) | JETSON-GPU (ms) | D1-CPU (ms) |
-|-------|------------|-----------|---------|------------|--------|
-| [YuNet](./models/face_detection_yunet)   | 160x120 | 1.45   | 6.22    | 12.18 | 86.69 |
-| [SFace](./models/face_recognition_sface) | 112x112 | 8.65 | 99.20 | 24.88 | --- |
-| [DB-IC15](./models/text_detection_db)    | 640x480 | 142.91 | 2835.91 | 208.41 | --- |
-| [DB-TD500](./models/text_detection_db)   | 640x480 | 142.91 | 2841.71 | 210.51 | --- |
-| [CRNN-EN](./models/text_recognition_crnn)   | 100x32  | 50.21  | 234.32  | 196.15 | --- |
-| [CRNN-CN](./models/text_recognition_crnn)   | 100x32  | 73.52  | 322.16  | 239.76 | --- |
-| [PP-ResNet](./models/image_classification_ppresnet) | 224x224 | 56.05 | 602.58 | 98.64 | --- |
-| [PP-HumanSeg](./models/human_segmentation_pphumanseg) | 192x192 | 19.92 | 105.32 | 67.97 | --- |
-| [WeChatQRCode](./models/qrcode_wechatqrcode) | 100x100 | 7.04 | 37.68 | --- | --- |
-| [DaSiamRPN](./models/object_tracking_dasiamrpn) | 1280x720 | 36.15 | 705.48 | 76.82 | --- |
-| [YoutuReID](./models/person_reid_youtureid) | 128x256 | 35.81 | 521.98 | 90.07 | --- |
 Hardware Setup:
 - `INTEL-CPU`: [Intel Core i7-5930K](https://www.intel.com/content/www/us/en/products/sku/82931/intel-core-i75930k-processor-15m-cache-up-to-3-70-ghz/specifications.html) @ 3.50GHz, 6 cores, 12 threads.
 - `RPI-CPU`: [Raspberry Pi 4B](https://www.raspberrypi.com/products/raspberry-pi-4-model-b/specifications/), Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz.
 - `JETSON-GPU`: [NVIDIA Jetson Nano B01](https://developer.nvidia.com/embedded/jetson-nano-developer-kit), 128-core NVIDIA Maxwell GPU.
 - `D1-CPU`: [Allwinner D1](https://d1.docs.aw-ol.com/en), [Xuantie C906 CPU](https://www.t-head.cn/product/C906?spm=a2ouz.12986968.0.0.7bfc1384auGNPZ) (RISC-V, RVV 0.7.1) @ 1.0GHz, 1 core. YuNet is supported for now. Visit [here](https://github.com/fengyuentau/opencv_zoo_cpp) for more details.
 ***Important Notes***:

 ## Models & Benchmark Results
+| Model | Input Size | INTEL-CPU (ms) | RPI-CPU (ms) | JETSON-GPU (ms) | KV3-NPU (ms) | D1-CPU (ms) |
+|-------|------------|----------------|--------------|-----------------|--------------|-------------|
+| [YuNet](./models/face_detection_yunet)                | 160x120  | 1.45   | 6.22    | 12.18  | 4.04   | 86.69 |
+| [SFace](./models/face_recognition_sface)              | 112x112  | 8.65   | 99.20   | 24.88  | 46.25  | ---   |
+| [DB-IC15](./models/text_detection_db)                 | 640x480  | 142.91 | 2835.91 | 208.41 | ---    | ---   |
+| [DB-TD500](./models/text_detection_db)                | 640x480  | 142.91 | 2841.71 | 210.51 | ---    | ---   |
+| [CRNN-EN](./models/text_recognition_crnn)             | 100x32   | 50.21  | 234.32  | 196.15 | 125.30 | ---   |
+| [CRNN-CN](./models/text_recognition_crnn)             | 100x32   | 73.52  | 322.16  | 239.76 | 166.79 | ---   |
+| [PP-ResNet](./models/image_classification_ppresnet)   | 224x224  | 56.05  | 602.58  | 98.64  | 75.45  | ---   |
+| [PP-HumanSeg](./models/human_segmentation_pphumanseg) | 192x192  | 19.92  | 105.32  | 67.97  | 74.77  | ---   |
+| [WeChatQRCode](./models/qrcode_wechatqrcode)          | 100x100  | 7.04   | 37.68   | ---    | ---    | ---   |
+| [DaSiamRPN](./models/object_tracking_dasiamrpn)       | 1280x720 | 36.15  | 705.48  | 76.82  | ---    | ---   |
+| [YoutuReID](./models/person_reid_youtureid)           | 128x256  | 35.81  | 521.98  | 90.07  | 44.61  | ---   |
 Hardware Setup:
 - `INTEL-CPU`: [Intel Core i7-5930K](https://www.intel.com/content/www/us/en/products/sku/82931/intel-core-i75930k-processor-15m-cache-up-to-3-70-ghz/specifications.html) @ 3.50GHz, 6 cores, 12 threads.
 - `RPI-CPU`: [Raspberry Pi 4B](https://www.raspberrypi.com/products/raspberry-pi-4-model-b/specifications/), Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz.
 - `JETSON-GPU`: [NVIDIA Jetson Nano B01](https://developer.nvidia.com/embedded/jetson-nano-developer-kit), 128-core NVIDIA Maxwell GPU.
+- `KV3-NPU`: [Khadas VIM3](https://www.khadas.com/vim3), 5TOPS Performance. Benchmarks are done using **quantized** models. [TIM-VX backend and NPU target support for OpenCV](https://github.com/opencv/opencv/pull/21036) is under reivew. You will need to compile OpenCV with TIM-VX following [this guide](https://gist.github.com/zihaomu/f040be4901d92e423f227c10dfa37650) to run benchmarks.
 - `D1-CPU`: [Allwinner D1](https://d1.docs.aw-ol.com/en), [Xuantie C906 CPU](https://www.t-head.cn/product/C906?spm=a2ouz.12986968.0.0.7bfc1384auGNPZ) (RISC-V, RVV 0.7.1) @ 1.0GHz, 1 core. YuNet is supported for now. Visit [here](https://github.com/fengyuentau/opencv_zoo_cpp) for more details.
 ***Important Notes***:

benchmark/benchmark.py CHANGED Viewed

@@ -5,7 +5,6 @@ import yaml
 import numpy as np
 import cv2 as cv
-# from ..models import MODELS
 from models import MODELS
 from utils import METRICS, DATALOADERS
@@ -61,7 +60,8 @@ class Benchmark:
             # inference_engine=cv.dnn.DNN_BACKEND_INFERENCE_ENGINE,
             opencv=cv.dnn.DNN_BACKEND_OPENCV,
             # vkcom=cv.dnn.DNN_BACKEND_VKCOM,
-            cuda=cv.dnn.DNN_BACKEND_CUDA
         )
         self._backend = available_backends[backend_id]
@@ -75,7 +75,8 @@ class Benchmark:
             # fpga=cv.dnn.DNN_TARGET_FPGA,
             cuda=cv.dnn.DNN_TARGET_CUDA,
             cuda_fp16=cv.dnn.DNN_TARGET_CUDA_FP16,
-            # hddl=cv.dnn.DNN_TARGET_HDDL
         )
         self._target = available_targets[target_id]
@@ -120,4 +121,4 @@ if __name__ == '__main__':
     # Run benchmarking
     print('Benchmarking {}:'.format(model.name))
     benchmark.run(model)
-    benchmark.printResults()

 import numpy as np
 import cv2 as cv
 from models import MODELS
 from utils import METRICS, DATALOADERS
             # inference_engine=cv.dnn.DNN_BACKEND_INFERENCE_ENGINE,
             opencv=cv.dnn.DNN_BACKEND_OPENCV,
             # vkcom=cv.dnn.DNN_BACKEND_VKCOM,
+            cuda=cv.dnn.DNN_BACKEND_CUDA,
+            timvx=cv.dnn.DNN_BACKEND_TIMVX
         )
         self._backend = available_backends[backend_id]
             # fpga=cv.dnn.DNN_TARGET_FPGA,
             cuda=cv.dnn.DNN_TARGET_CUDA,
             cuda_fp16=cv.dnn.DNN_TARGET_CUDA_FP16,
+            # hddl=cv.dnn.DNN_TARGET_HDDL,
+            npu=cv.dnn.DNN_TARGET_NPU
         )
         self._target = available_targets[target_id]
     # Run benchmarking
     print('Benchmarking {}:'.format(model.name))
     benchmark.run(model)
+    benchmark.printResults()