Yiyao Wang commited on
Commit
46b1f95
·
1 Parent(s): 0367d6a

Text Recognition: Add script to evaluate text recognition by ICDAR2003 (#71)

Browse files

* update readme

* add another script

* revise details for this pr

models/text_recognition_crnn/README.md CHANGED
@@ -2,11 +2,24 @@
2
 
3
  An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
4
 
 
 
 
 
 
 
 
 
 
 
 
5
  Note:
6
  - Model source:
7
  - `text_recognition_CRNN_EN_2021sep.onnx`: https://docs.opencv.org/4.5.2/d9/d1e/tutorial_dnn_OCR.html (CRNN_VGG_BiLSTM_CTC.onnx)
 
8
  - `text_recognition_CRNN_CN_2021nov.onnx`: https://docs.opencv.org/4.5.2/d4/d43/tutorial_dnn_text_spotting.html (crnn_cs_CN.onnx)
9
  - `text_recognition_CRNN_EN_2021sep.onnx` can detect digits (0\~9) and letters (return lowercase letters a\~z) (view `charset_36_EN.txt` for details).
 
10
  - `text_recognition_CRNN_CN_2021nov.onnx` can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), some Chinese characters and some special characters (view `charset_3944_CN.txt` for details).
11
  - For details on training this model series, please visit https://github.com/zihaomu/deep-text-recognition-benchmark.
12
 
@@ -16,6 +29,7 @@ Note:
16
  - This demo uses [text_detection_db](../text_detection_db) as text detector.
17
  - Selected model must match with the charset:
18
  - Try `text_recognition_CRNN_EN_2021sep.onnx` with `charset_36_EN.txt`.
 
19
  - Try `text_recognition_CRNN_CN_2021sep.onnx` with `charset_3944_CN.txt`.
20
 
21
  Run the demo detecting English:
 
2
 
3
  An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
4
 
5
+ Results of accuracy evaluation with [tools/eval](../../tools/eval) at different text recognition datasets.
6
+
7
+ | Model name | ICDAR03(%) | IIIT5k(%) | CUTE80(%) |
8
+ |--------------|------------|-----------|-----------|
9
+ | CRNN_EN | 81.66 | 74.33 | 52.78 |
10
+ | CRNN_EN_FP16 | 82.01 | 74.93 | 52.34 |
11
+ | CRNN_CH | 71.28 | 80.90 | 67.36 |
12
+ | CRNN_CH_FP16 | 78.63 | 80.93 | 67.01 |
13
+
14
+ \*: 'FP16' stands for 'model quantized into FP16'.
15
+
16
  Note:
17
  - Model source:
18
  - `text_recognition_CRNN_EN_2021sep.onnx`: https://docs.opencv.org/4.5.2/d9/d1e/tutorial_dnn_OCR.html (CRNN_VGG_BiLSTM_CTC.onnx)
19
+ - `text_recognition_CRNN_CH_2021sep.onnx`: https://docs.opencv.org/4.x/d4/d43/tutorial_dnn_text_spotting.html (crnn_cs.onnx)
20
  - `text_recognition_CRNN_CN_2021nov.onnx`: https://docs.opencv.org/4.5.2/d4/d43/tutorial_dnn_text_spotting.html (crnn_cs_CN.onnx)
21
  - `text_recognition_CRNN_EN_2021sep.onnx` can detect digits (0\~9) and letters (return lowercase letters a\~z) (view `charset_36_EN.txt` for details).
22
+ - `text_recognition_CRNN_CH_2021sep.onnx` can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), and some special characters (view `charset_94_CH.txt` for details).
23
  - `text_recognition_CRNN_CN_2021nov.onnx` can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), some Chinese characters and some special characters (view `charset_3944_CN.txt` for details).
24
  - For details on training this model series, please visit https://github.com/zihaomu/deep-text-recognition-benchmark.
25
 
 
29
  - This demo uses [text_detection_db](../text_detection_db) as text detector.
30
  - Selected model must match with the charset:
31
  - Try `text_recognition_CRNN_EN_2021sep.onnx` with `charset_36_EN.txt`.
32
+ - Try `text_recognition_CRNN_CH_2021sep.onnx` with `charset_94_CH.txt`
33
  - Try `text_recognition_CRNN_CN_2021sep.onnx` with `charset_3944_CN.txt`.
34
 
35
  Run the demo detecting English:
models/text_recognition_crnn/charset_94_CH.txt ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 0
2
+ 1
3
+ 2
4
+ 3
5
+ 4
6
+ 5
7
+ 6
8
+ 7
9
+ 8
10
+ 9
11
+ a
12
+ b
13
+ c
14
+ d
15
+ e
16
+ f
17
+ g
18
+ h
19
+ i
20
+ j
21
+ k
22
+ l
23
+ m
24
+ n
25
+ o
26
+ p
27
+ q
28
+ r
29
+ s
30
+ t
31
+ u
32
+ v
33
+ w
34
+ x
35
+ y
36
+ z
37
+ A
38
+ B
39
+ C
40
+ D
41
+ E
42
+ F
43
+ G
44
+ H
45
+ I
46
+ J
47
+ K
48
+ L
49
+ M
50
+ N
51
+ O
52
+ P
53
+ Q
54
+ R
55
+ S
56
+ T
57
+ U
58
+ V
59
+ W
60
+ X
61
+ Y
62
+ Z
63
+ !
64
+ "
65
+ #
66
+ $
67
+ %
68
+ &
69
+ '
70
+ (
71
+ )
72
+ *
73
+ +
74
+ ,
75
+ -
76
+ .
77
+ /
78
+ :
79
+ ;
80
+ <
81
+ =
82
+ >
83
+ ?
84
+ @
85
+ [
86
+ \
87
+ ]
88
+ ^
89
+ _
90
+ `
91
+ {
92
+ |
93
+ }
94
+ ~
models/text_recognition_crnn/crnn.py CHANGED
@@ -54,7 +54,9 @@ class CRNN:
54
  rotationMatrix = cv.getPerspectiveTransform(vertices, self._targetVertices)
55
  cropped = cv.warpPerspective(image, rotationMatrix, self._inputSize)
56
 
57
- if 'CN' in self._model_path:
 
 
58
  pass
59
  else:
60
  cropped = cv.cvtColor(cropped, cv.COLOR_BGR2GRAY)
 
54
  rotationMatrix = cv.getPerspectiveTransform(vertices, self._targetVertices)
55
  cropped = cv.warpPerspective(image, rotationMatrix, self._inputSize)
56
 
57
+ # 'CN' can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), and some special characters
58
+ # 'CH' can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), some Chinese characters and some special characters
59
+ if 'CN' in self._model_path or 'CH' in self._model_path:
60
  pass
61
  else:
62
  cropped = cv.cvtColor(cropped, cv.COLOR_BGR2GRAY)
tools/eval/README.md CHANGED
@@ -19,6 +19,8 @@ Supported datasets:
19
  - [ImageNet](#imagenet)
20
  - [WIDERFace](#widerface)
21
  - [LFW](#lfw)
 
 
22
 
23
  ## ImageNet
24
 
@@ -137,4 +139,55 @@ Run evaluation with the following command:
137
 
138
  ```shell
139
  python eval.py -m sface -d lfw -dr /path/to/lfw
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
  ```
 
19
  - [ImageNet](#imagenet)
20
  - [WIDERFace](#widerface)
21
  - [LFW](#lfw)
22
+ - [ICDAR](#icdar)
23
+ - [IIIT5K](#iiit5k)
24
 
25
  ## ImageNet
26
 
 
139
 
140
  ```shell
141
  python eval.py -m sface -d lfw -dr /path/to/lfw
142
+ ```
143
+
144
+ ## ICDAR2003
145
+
146
+ ### Prepare data
147
+
148
+ Please visit http://iapr-tc11.org/mediawiki/index.php/ICDAR_2003_Robust_Reading_Competitions to download the ICDAR2003 dataset and the labels.
149
+
150
+ ```shell
151
+ $ tree -L 2 /path/to/icdar
152
+ .
153
+ ├── word
154
+ │   ├── 1
155
+ │ │ ├── self
156
+ │ │ ├── ...
157
+ │ │ └── willcooks
158
+ │   ├── ...
159
+ │   └── 12
160
+ └── word.xml
161
+   
162
+ ```
163
+
164
+ ### Evaluation
165
+
166
+ Run evaluation with the following command:
167
+
168
+ ```shell
169
+ python eval.py -m crnn -d icdar -dr /path/to/icdar
170
+ ```
171
+
172
+ ### Example
173
+
174
+ ```shell
175
+ download zip file from http://www.iapr-tc11.org/dataset/ICDAR2003_RobustReading/TrialTrain/word.zip
176
+ upzip file to /path/to/icdar
177
+ python eval.py -m crnn -d icdar -dr /path/to/icdar
178
+ ```
179
+
180
+ ## IIIT5K
181
+
182
+ ### Prepare data
183
+
184
+ Please visit https://github.com/cv-small-snails/Text-Recognition-Material to download the IIIT5K dataset and the labels.
185
+
186
+ ### Evaluation
187
+
188
+ All the datasets in the format of lmdb can be evaluated by this script.<br>
189
+ Run evaluation with the following command:
190
+
191
+ ```shell
192
+ python eval.py -m crnn -d iiit5k -dr /path/to/iiit5k
193
  ```
tools/eval/datasets/__init__.py CHANGED
@@ -1,6 +1,8 @@
1
  from .imagenet import ImageNet
2
  from .widerface import WIDERFace
3
  from .lfw import LFW
 
 
4
 
5
  class Registery:
6
  def __init__(self, name):
@@ -16,4 +18,6 @@ class Registery:
16
  DATASETS = Registery("Datasets")
17
  DATASETS.register(ImageNet)
18
  DATASETS.register(WIDERFace)
19
- DATASETS.register(LFW)
 
 
 
1
  from .imagenet import ImageNet
2
  from .widerface import WIDERFace
3
  from .lfw import LFW
4
+ from .icdar import ICDAR
5
+ from .iiit5k import IIIT5K
6
 
7
  class Registery:
8
  def __init__(self, name):
 
18
  DATASETS = Registery("Datasets")
19
  DATASETS.register(ImageNet)
20
  DATASETS.register(WIDERFace)
21
+ DATASETS.register(LFW)
22
+ DATASETS.register(ICDAR)
23
+ DATASETS.register(IIIT5K)
tools/eval/datasets/icdar.py ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import numpy as np
3
+ import cv2 as cv
4
+ import xml.dom.minidom as minidom
5
+ from tqdm import tqdm
6
+
7
+ class ICDAR:
8
+ def __init__(self, root):
9
+ self.root = root
10
+ self.acc = -1
11
+ self.inputSize = [100, 32]
12
+ self.val_label_file = os.path.join(root, "word.xml")
13
+ self.val_label = self.load_label(self.val_label_file)
14
+
15
+ @property
16
+ def name(self):
17
+ return self.__class__.__name__
18
+
19
+ def load_label(self, label_file):
20
+ label = list()
21
+ dom = minidom.getDOMImplementation().createDocument(None, 'Root', None)
22
+ root = dom.documentElement
23
+ dom = minidom.parse(self.val_label_file)
24
+ root = dom.documentElement
25
+ names = root.getElementsByTagName('image')
26
+ for name in names:
27
+ key = os.path.join(self.root, name.getAttribute('file'))
28
+ value = name.getAttribute('tag').lower()
29
+ label.append([key, value])
30
+
31
+ return label
32
+
33
+ def eval(self, model):
34
+ right_num = 0
35
+ pbar = tqdm(self.val_label)
36
+ for fn, label in pbar:
37
+ pbar.set_description("Evaluating {} with {} val set".format(model.name, self.name))
38
+
39
+ img = cv.imread(fn)
40
+
41
+ rbbox = np.array([0, img.shape[0], 0, 0, img.shape[1], 0, img.shape[1], img.shape[0]])
42
+ pred = model.infer(img, rbbox)
43
+ if label == pred:
44
+ right_num += 1
45
+
46
+ self.acc = right_num/(len(self.val_label) * 1.0)
47
+
48
+
49
+ def get_result(self):
50
+ return self.acc
51
+
52
+ def print_result(self):
53
+ print("Accuracy: {:.2f}%".format(self.acc*100))
tools/eval/datasets/iiit5k.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import lmdb
2
+ import os
3
+ import numpy as np
4
+ import cv2 as cv
5
+ from tqdm import tqdm
6
+
7
+ class IIIT5K:
8
+ def __init__(self, root):
9
+ self.root = root
10
+ self.acc = -1
11
+ self.inputSize = [100, 32]
12
+
13
+ self.val_label = self.load_label(self.root)
14
+
15
+ @property
16
+ def name(self):
17
+ return self.__class__.__name__
18
+
19
+ def load_label(self, root):
20
+ lmdb_file = root
21
+ lmdb_env = lmdb.open(lmdb_file)
22
+ lmdb_txn = lmdb_env.begin()
23
+ lmdb_cursor = lmdb_txn.cursor()
24
+ label = list()
25
+ for key, value in lmdb_cursor:
26
+ image_index = key.decode()
27
+ if image_index.split('-')[0] == 'image':
28
+ img = cv.imdecode(np.fromstring(value, np.uint8), 3)
29
+ label_index = 'label-' + image_index.split('-')[1]
30
+ value = lmdb_txn.get(label_index.encode()).decode().lower()
31
+ label.append([img, value])
32
+ else:
33
+ break
34
+ return label
35
+
36
+ def eval(self, model):
37
+ right_num = 0
38
+ pbar = tqdm(self.val_label)
39
+ for img, value in pbar:
40
+ pbar.set_description("Evaluating {} with {} val set".format(model.name, self.name))
41
+
42
+
43
+ rbbox = np.array([0, img.shape[0], 0, 0, img.shape[1], 0, img.shape[1], img.shape[0]])
44
+ pred = model.infer(img, rbbox).lower()
45
+ if value == pred:
46
+ right_num += 1
47
+
48
+ self.acc = right_num/(len(self.val_label) * 1.0)
49
+
50
+
51
+ def get_result(self):
52
+ return self.acc
53
+
54
+ def print_result(self):
55
+ print("Accuracy: {:.2f}%".format(self.acc*100))
tools/eval/eval.py CHANGED
@@ -73,6 +73,11 @@ models = dict(
73
  name="SFace",
74
  topic="face_recognition",
75
  modelPath=os.path.join(root_dir, "models/face_recognition_sface/face_recognition_sface_2021dec-act_int8-wt_int8-quantized.onnx")),
 
 
 
 
 
76
  )
77
 
78
  datasets = dict(
@@ -87,6 +92,12 @@ datasets = dict(
87
  name="LFW",
88
  topic="face_recognition",
89
  target_size=112),
 
 
 
 
 
 
90
  )
91
 
92
  def main(args):
 
73
  name="SFace",
74
  topic="face_recognition",
75
  modelPath=os.path.join(root_dir, "models/face_recognition_sface/face_recognition_sface_2021dec-act_int8-wt_int8-quantized.onnx")),
76
+ crnn=dict(
77
+ name="CRNN",
78
+ topic="text_recognition",
79
+ modelPath=os.path.join(root_dir, "models/text_recognition_crnn/text_recognition_CRNN_EN_2021sep.onnx"),
80
+ charsetPath=os.path.join(root_dir, "models/text_recognition_crnn/charset_36_EN.txt")),
81
  )
82
 
83
  datasets = dict(
 
92
  name="LFW",
93
  topic="face_recognition",
94
  target_size=112),
95
+ icdar=dict(
96
+ name="ICDAR",
97
+ topic="text_recognition"),
98
+ iiit5k=dict(
99
+ name="IIIT5K",
100
+ topic="text_recognition"),
101
  )
102
 
103
  def main(args):