Yuantao Feng
commited on
Commit
·
83bb178
1
Parent(s):
2ef6bc9
Renaming model files to have more information on architecture, training data and more (#7)
Browse files* add suffix of training dataset, arch & upload time to each model
* update DB-IC15 benchmark results
- README.md +10 -4
- benchmark/config/face_detection_yunet.yaml +1 -1
- benchmark/config/face_recognition_sface.yaml +1 -1
- benchmark/config/human_segmentation_pphumanseg.yaml +1 -1
- benchmark/config/image_classification_ppresnet.yaml +1 -1
- benchmark/config/text_detection_db.yaml +1 -1
- benchmark/config/text_recognition_crnn.yaml +1 -1
- models/face_detection_yunet/README.md +4 -0
- models/face_recognition_sface/README.md +5 -5
- models/text_detection_db/README.md +5 -1
- models/text_detection_db/demo.py +1 -1
- models/text_recognition_crnn/README.md +4 -2
- models/text_recognition_crnn/demo.py +1 -1
README.md
CHANGED
@@ -5,6 +5,12 @@ A zoo for models tuned for OpenCV DNN with benchmarks on different platforms.
|
|
5 |
Guidelines:
|
6 |
- To clone this repo, please install [git-lfs](https://git-lfs.github.com/), run `git lfs install` and use `git lfs clone https://github.com/opencv/opencv_zoo`.
|
7 |
- To run benchmark on your hardware settings, please refer to [benchmark/README](./benchmark/README.md).
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
## Models & Benchmarks
|
10 |
|
@@ -16,19 +22,19 @@ Hardware Setup:
|
|
16 |
***Important Notes***:
|
17 |
- The time data that shown on the following table presents the time elapsed from preprocess (resize is excluded), to a forward pass of a network, and postprocess to get final results.
|
18 |
- The time data that shown on the following table is the median of 10 runs. Different metrics may be applied to some specific models.
|
|
|
19 |
- View [benchmark/config](./benchmark/config) for more details on benchmarking different models.
|
20 |
|
21 |
-
| Model | Input Size | CPU x86_64 (ms) | CPU ARM (ms) | GPU CUDA (ms)
|
22 |
|-------|------------|-----------------|--------------|---------------|
|
23 |
| [YuNet](./models/face_detection_yunet) | 160x120 | 1.45 | 6.22 | 12.18 |
|
24 |
-
| [DB](./models/text_detection_db)
|
|
|
25 |
| [CRNN](./models/text_recognition_crnn) | 100x32 | 50.21 | 234.32 | 196.15 |
|
26 |
| [SFace](./models/face_recognition_sface) | 112x112 | 8.65 | 99.20 | 24.88 |
|
27 |
| [PP-ResNet](./models/image_classification_ppresnet) | 224x224 | 56.05 | 602.58 | 98.64 |
|
28 |
| [PP-HumanSeg](./models/human_segmentation_pphumanseg) | 192x192 | 19.92 | 105.32 | 67.97 |
|
29 |
|
30 |
-
*: Batch size is 1.
|
31 |
-
|
32 |
## License
|
33 |
|
34 |
OpenCV Zoo is licensed under the [Apache 2.0 license](./LICENSE). Please refer to licenses of different models.
|
|
|
5 |
Guidelines:
|
6 |
- To clone this repo, please install [git-lfs](https://git-lfs.github.com/), run `git lfs install` and use `git lfs clone https://github.com/opencv/opencv_zoo`.
|
7 |
- To run benchmark on your hardware settings, please refer to [benchmark/README](./benchmark/README.md).
|
8 |
+
- Understand model filename: `<topic>_<model_name>_<dataset>_<arch>_<upload_time>`
|
9 |
+
- `<topic>`: research topics, such as `face detection` etc.
|
10 |
+
- `<model_name>`: exact model names.
|
11 |
+
- `<dataset>`: (Optional) the dataset that the model is trained with.
|
12 |
+
- `<arch>`: (Optional) the backbone architecture of the model.
|
13 |
+
- `<upload_time>`: the time when the model is uploaded, meaning the latest version of this model unless specified.
|
14 |
|
15 |
## Models & Benchmarks
|
16 |
|
|
|
22 |
***Important Notes***:
|
23 |
- The time data that shown on the following table presents the time elapsed from preprocess (resize is excluded), to a forward pass of a network, and postprocess to get final results.
|
24 |
- The time data that shown on the following table is the median of 10 runs. Different metrics may be applied to some specific models.
|
25 |
+
- Batch size is 1 for all benchmark results.
|
26 |
- View [benchmark/config](./benchmark/config) for more details on benchmarking different models.
|
27 |
|
28 |
+
| Model | Input Size | CPU x86_64 (ms) | CPU ARM (ms) | GPU CUDA (ms) |
|
29 |
|-------|------------|-----------------|--------------|---------------|
|
30 |
| [YuNet](./models/face_detection_yunet) | 160x120 | 1.45 | 6.22 | 12.18 |
|
31 |
+
| [DB-IC15](./models/text_detection_db) | 640x480 | 142.91 | 2835.91 | 208.41 |
|
32 |
+
| [DB-TD500](./models/text_detection_db) | 640x480 | 142.91 | 2841.71 | 210.51 |
|
33 |
| [CRNN](./models/text_recognition_crnn) | 100x32 | 50.21 | 234.32 | 196.15 |
|
34 |
| [SFace](./models/face_recognition_sface) | 112x112 | 8.65 | 99.20 | 24.88 |
|
35 |
| [PP-ResNet](./models/image_classification_ppresnet) | 224x224 | 56.05 | 602.58 | 98.64 |
|
36 |
| [PP-HumanSeg](./models/human_segmentation_pphumanseg) | 192x192 | 19.92 | 105.32 | 67.97 |
|
37 |
|
|
|
|
|
38 |
## License
|
39 |
|
40 |
OpenCV Zoo is licensed under the [Apache 2.0 license](./LICENSE). Please refer to licenses of different models.
|
benchmark/config/face_detection_yunet.yaml
CHANGED
@@ -16,7 +16,7 @@ Benchmark:
|
|
16 |
|
17 |
Model:
|
18 |
name: "YuNet"
|
19 |
-
modelPath: "models/face_detection_yunet/
|
20 |
confThreshold: 0.6
|
21 |
nmsThreshold: 0.3
|
22 |
topK: 5000
|
|
|
16 |
|
17 |
Model:
|
18 |
name: "YuNet"
|
19 |
+
modelPath: "models/face_detection_yunet/face_detection_yunet_2021sep.onnx"
|
20 |
confThreshold: 0.6
|
21 |
nmsThreshold: 0.3
|
22 |
topK: 5000
|
benchmark/config/face_recognition_sface.yaml
CHANGED
@@ -14,4 +14,4 @@ Benchmark:
|
|
14 |
|
15 |
Model:
|
16 |
name: "SFace"
|
17 |
-
modelPath: "models/face_recognition_sface/
|
|
|
14 |
|
15 |
Model:
|
16 |
name: "SFace"
|
17 |
+
modelPath: "models/face_recognition_sface/face_recognition_sface_2021sep.onnx"
|
benchmark/config/human_segmentation_pphumanseg.yaml
CHANGED
@@ -15,4 +15,4 @@ Benchmark:
|
|
15 |
|
16 |
Model:
|
17 |
name: "PPHumanSeg"
|
18 |
-
modelPath: "models/human_segmentation_pphumanseg/
|
|
|
15 |
|
16 |
Model:
|
17 |
name: "PPHumanSeg"
|
18 |
+
modelPath: "models/human_segmentation_pphumanseg/human_segmentation_pphumanseg_2021oct.onnx"
|
benchmark/config/image_classification_ppresnet.yaml
CHANGED
@@ -16,5 +16,5 @@ Benchmark:
|
|
16 |
|
17 |
Model:
|
18 |
name: "PPResNet"
|
19 |
-
modelPath: "models/image_classification_ppresnet/
|
20 |
labelPath: "models/image_classification_ppresnet/imagenet_labels.txt"
|
|
|
16 |
|
17 |
Model:
|
18 |
name: "PPResNet"
|
19 |
+
modelPath: "models/image_classification_ppresnet/image_classification_ppresnet50_2021oct.onnx"
|
20 |
labelPath: "models/image_classification_ppresnet/imagenet_labels.txt"
|
benchmark/config/text_detection_db.yaml
CHANGED
@@ -15,7 +15,7 @@ Benchmark:
|
|
15 |
|
16 |
Model:
|
17 |
name: "DB"
|
18 |
-
modelPath: "models/text_detection_db/
|
19 |
binaryThreshold: 0.3
|
20 |
polygonThreshold: 0.5
|
21 |
maxCandidates: 200
|
|
|
15 |
|
16 |
Model:
|
17 |
name: "DB"
|
18 |
+
modelPath: "models/text_detection_db/text_detection_DB_TD500_resnet18_2021sep.onnx"
|
19 |
binaryThreshold: 0.3
|
20 |
polygonThreshold: 0.5
|
21 |
maxCandidates: 200
|
benchmark/config/text_recognition_crnn.yaml
CHANGED
@@ -14,4 +14,4 @@ Benchmark:
|
|
14 |
|
15 |
Model:
|
16 |
name: "CRNN"
|
17 |
-
modelPath: "models/text_recognition_crnn/
|
|
|
14 |
|
15 |
Model:
|
16 |
name: "CRNN"
|
17 |
+
modelPath: "models/text_recognition_crnn/text_recognition_CRNN_VGG_BiLSTM_CTC_2021sep.onnx"
|
models/face_detection_yunet/README.md
CHANGED
@@ -2,6 +2,10 @@
|
|
2 |
|
3 |
YuNet is a light-weight, fast and accurate face detection model, which achieves 0.834(AP_easy), 0.824(AP_medium), 0.708(AP_hard) on the WIDER Face validation set.
|
4 |
|
|
|
|
|
|
|
|
|
5 |
## Demo
|
6 |
|
7 |
Run the following command to try the demo:
|
|
|
2 |
|
3 |
YuNet is a light-weight, fast and accurate face detection model, which achieves 0.834(AP_easy), 0.824(AP_medium), 0.708(AP_hard) on the WIDER Face validation set.
|
4 |
|
5 |
+
Notes:
|
6 |
+
- Model source: [here](https://github.com/ShiqiYu/libfacedetection.train/blob/a61a428929148171b488f024b5d6774f93cdbc13/tasks/task1/onnx/yunet.onnx).
|
7 |
+
- For details on training this model, please visit https://github.com/ShiqiYu/libfacedetection.train.
|
8 |
+
|
9 |
## Demo
|
10 |
|
11 |
Run the following command to try the demo:
|
models/face_recognition_sface/README.md
CHANGED
@@ -2,15 +2,15 @@
|
|
2 |
|
3 |
SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition
|
4 |
|
5 |
-
SFace is contributed by [Yaoyao Zhong](https://github.com/zhongyy/SFace). [face_recognition_sface.onnx](./face_recognition_sface.onnx) is converted from the model from https://github.com/zhongyy/SFace thanks to [Chengrui Wang](https://github.com/crywang).
|
6 |
-
|
7 |
Note:
|
8 |
-
-
|
9 |
-
-
|
10 |
-
-
|
11 |
|
12 |
## Demo
|
13 |
|
|
|
|
|
14 |
Run the following command to try the demo:
|
15 |
```shell
|
16 |
# recognize on images
|
|
|
2 |
|
3 |
SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition
|
4 |
|
|
|
|
|
5 |
Note:
|
6 |
+
- SFace is contributed by [Yaoyao Zhong](https://github.com/zhongyy/SFace).
|
7 |
+
- [face_recognition_sface_2021sep.onnx](./face_recognition_sface_2021sep.onnx) is converted from the model from https://github.com/zhongyy/SFace thanks to [Chengrui Wang](https://github.com/crywang).
|
8 |
+
- Support 5-landmark warpping for now (2021sep)
|
9 |
|
10 |
## Demo
|
11 |
|
12 |
+
***NOTE***: This demo uses [../face_detection_yunet](../face_detection_yunet) as face detector, which supports 5-landmark detection for now (2021sep).
|
13 |
+
|
14 |
Run the following command to try the demo:
|
15 |
```shell
|
16 |
# recognize on images
|
models/text_detection_db/README.md
CHANGED
@@ -2,7 +2,11 @@
|
|
2 |
|
3 |
Real-time Scene Text Detection with Differentiable Binarization
|
4 |
|
5 |
-
|
|
|
|
|
|
|
|
|
6 |
|
7 |
## Demo
|
8 |
|
|
|
2 |
|
3 |
Real-time Scene Text Detection with Differentiable Binarization
|
4 |
|
5 |
+
Note:
|
6 |
+
- Models source: [here](https://drive.google.com/drive/folders/1qzNCHfUJOS0NEUOIKn69eCtxdlNPpWbq).
|
7 |
+
- `IC15` in the filename means the model is trained on [IC15 dataset](https://rrc.cvc.uab.es/?ch=4&com=introduction), which can detect English text instances only.
|
8 |
+
- `TD500` in the filename means the model is trained on [TD500 dataset](http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500)), which can detect both English & Chinese instances.
|
9 |
+
- Visit https://docs.opencv.org/master/d4/d43/tutorial_dnn_text_spotting.html for more information.
|
10 |
|
11 |
## Demo
|
12 |
|
models/text_detection_db/demo.py
CHANGED
@@ -21,7 +21,7 @@ def str2bool(v):
|
|
21 |
|
22 |
parser = argparse.ArgumentParser(description='Real-time Scene Text Detection with Differentiable Binarization (https://arxiv.org/abs/1911.08947).')
|
23 |
parser.add_argument('--input', '-i', type=str, help='Path to the input image. Omit for using default camera.')
|
24 |
-
parser.add_argument('--model', '-m', type=str, default='
|
25 |
parser.add_argument('--width', type=int, default=736,
|
26 |
help='Preprocess input image by resizing to a specific width. It should be multiple by 32.')
|
27 |
parser.add_argument('--height', type=int, default=736,
|
|
|
21 |
|
22 |
parser = argparse.ArgumentParser(description='Real-time Scene Text Detection with Differentiable Binarization (https://arxiv.org/abs/1911.08947).')
|
23 |
parser.add_argument('--input', '-i', type=str, help='Path to the input image. Omit for using default camera.')
|
24 |
+
parser.add_argument('--model', '-m', type=str, default='text_detection_DB_TD500_resnet18.onnx', help='Path to the model.')
|
25 |
parser.add_argument('--width', type=int, default=736,
|
26 |
help='Preprocess input image by resizing to a specific width. It should be multiple by 32.')
|
27 |
parser.add_argument('--height', type=int, default=736,
|
models/text_recognition_crnn/README.md
CHANGED
@@ -2,11 +2,13 @@
|
|
2 |
|
3 |
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
|
4 |
|
5 |
-
|
|
|
|
|
6 |
|
7 |
## Demo
|
8 |
|
9 |
-
***NOTE
|
10 |
|
11 |
Run the following command to try the demo:
|
12 |
```shell
|
|
|
2 |
|
3 |
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
|
4 |
|
5 |
+
Note:
|
6 |
+
- Model source: https://docs.opencv.org/4.5.2/d9/d1e/tutorial_dnn_OCR.html.
|
7 |
+
- For details on training this model, please visit https://github.com/zihaomu/deep-text-recognition-benchmark, which can only recognize english words.
|
8 |
|
9 |
## Demo
|
10 |
|
11 |
+
***NOTE***: This demo uses [text_detection_db](../text_detection_db) as text detector.
|
12 |
|
13 |
Run the following command to try the demo:
|
14 |
```shell
|
models/text_recognition_crnn/demo.py
CHANGED
@@ -26,7 +26,7 @@ def str2bool(v):
|
|
26 |
parser = argparse.ArgumentParser(
|
27 |
description="An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition (https://arxiv.org/abs/1507.05717)")
|
28 |
parser.add_argument('--input', '-i', type=str, help='Path to the input image. Omit for using default camera.')
|
29 |
-
parser.add_argument('--model', '-m', type=str, default='
|
30 |
parser.add_argument('--width', type=int, default=736,
|
31 |
help='The width of input image being sent to the text detector.')
|
32 |
parser.add_argument('--height', type=int, default=736,
|
|
|
26 |
parser = argparse.ArgumentParser(
|
27 |
description="An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition (https://arxiv.org/abs/1507.05717)")
|
28 |
parser.add_argument('--input', '-i', type=str, help='Path to the input image. Omit for using default camera.')
|
29 |
+
parser.add_argument('--model', '-m', type=str, default='text_recognition_CRNN_VGG_BiLSTM_CTC.onnx', help='Path to the model.')
|
30 |
parser.add_argument('--width', type=int, default=736,
|
31 |
help='The width of input image being sent to the text detector.')
|
32 |
parser.add_argument('--height', type=int, default=736,
|