fisherman611 commited on
Commit
8313ba2
·
verified ·
1 Parent(s): a4a39ab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -98
README.md CHANGED
@@ -1,98 +1,111 @@
1
- # **Handwritten Mathematical Expression Recognition**
2
-
3
- ## **Project Overview**
4
- This project focuses on recognizing handwritten mathematical expressions and converting them into LaTeX format. The system leverages deep learning techniques to process images of handwritten equations, interpret their structure, and generate corresponding LaTeX code. The primary goal is to achieve high accuracy in recognizing complex mathematical expressions, addressing challenges such as varying handwriting styles and intricate symbol arrangements. The project is built using PyTorch and incorporates advanced neural network architectures tailored for this task.
5
-
6
- ## **Dataset**
7
-
8
- The project utilizes the **CROHME (Competition on Recognition of Online Handwritten Mathematical Expressions)** dataset, a widely used benchmark for handwritten mathematical expression recognition. The dataset is organized into several subsets, each containing images and their corresponding LaTeX annotations.
9
-
10
- Download the splitted dataset: [CROHME Splitted](https://husteduvn-my.sharepoint.com/:f:/g/personal/thanh_lh225458_sis_hust_edu_vn/EviH0ckuHR9KiXftU5ETkPQBHvEL77YTscIHvfN7LBSrSg?e=CHwNxv) and then place in the `data/` directory.
11
-
12
- ## **Methods and Models**
13
-
14
- ### **Preprocessing**
15
- Steps to clean and standardize images:
16
- * Load in grayscale.
17
- * Use Canny edge detection, dilate with $7 \times 13$ kernel to connect edges.
18
- * Crop with F1-score method to focus on the expression.
19
- * Binarize with adaptive thresholding; set background to black if needed.
20
- * Apply median blur (kernel 3) multiple times to reduce noise.
21
- * Add 5-pixel padding, resize to $128 \times 384$, pad with black if needed.
22
-
23
- ### **Augmentation**
24
- Augmentation to handle handwriting variations:
25
- * Rotate up to 5 degrees, border replication.
26
- * Elastic transform for stroke variations.
27
- * Random morphology: erode or dilate to change stroke thickness.
28
- * Normalize and convert to tensor.
29
-
30
- ### **Model: Counting-Aware Network (CAN)**
31
-
32
- CAN is an end-to-end model for HMER, combining recognition and symbol counting:
33
- * **Backbone:**
34
-
35
- * DenseNet (or ResNet)
36
- * Takes grayscale image $H' \times W' \times 1$, outputs feature map $\mathcal{F} \in \mathbb{R}^{H \times W \times 684}$, where ($H = \frac{H'}{16}$), ($W = \frac{W'}{16}$).
37
-
38
- * **Multi-Scale Counting Module (MSCM):**
39
-
40
- * Uses $3 \times 3$ and $5 \times 5$ conv branches for multi-scales features.
41
- * Channel attention: $$\mathcal{Q} = \sigma(W_1(G(\mathcal{H})) + b_1)$$ $$\mathcal{S} = \mathcal{Q} \otimes g(W_2 \mathcal{Q} + b_2)$$
42
- * Concatenates features, $1 \times 1$ conv to counting map $$\mathcal{M} \in \mathbb{R}^{H \times W \times C}$$
43
- * Sum-pooling gives counting vector $$\mathcal{V}i = \sum{p=1}^H \sum_{q=1}^W \mathcal{M}_{i,pq}$$
44
-
45
- * **Counting-Combined Attentional Decoder (CCAD):**
46
-
47
- * $1 \times 1$ conv on $\mathcal{F}$ to $\mathcal{T} \in \mathbb{R}^{H \times W \times 512}$, adds positional encoding.
48
- * GRU gives hidden state $h_t \in \mathbb{R}^{1 \times 256}$, attention weights: $$e_{t,ij} = w^T \tanh(\mathcal{T} + \mathcal{P} + W_a \mathcal{A} + W_h h_t) + b$$ $$\alpha_{t,ij} = \frac{\exp(e_{t,ij})}{\sum_{p=1}^H \sum_{q=1}^W \exp(e_{t,pq})}$$
49
- * Context vector $\mathcal{C} \in \mathbb{R}^{1 \times 256}$, predicts token: $$p(y_t) = \text{softmax}(w_o^T (W_c \mathcal{C} + W_v \mathcal{V}^f + W_t h_t + W_e E(y_{t-1})) + b_o)$$
50
- * Beam search (width = 5) for inference.
51
-
52
- * **Loss:**
53
-
54
- * Loss class: $$\mathcal{L}{\text{cls}} = -\frac{1}{T} \sum{t=1}^T \log(p(y_t))$$
55
- * Loss counting: $$\mathcal{L}{\text{counting}} = \text{smooth}{L_1}(\mathcal{V}^f, \hat{\mathcal{V}})$$
56
- * Total loss: $$\mathcal{L} = \mathcal{L}{\text{cls}} + \lambda \mathcal{L}{\text{counting}}$$, $$\lambda = 0.01$$
57
-
58
- ## **Results**
59
- | Model | ExpRate | ExpRate-Leq1 | ExpRate-Leq2 | ExpRate-Leq3 |
60
- |-------------|--------|---------|---------|---------|
61
- | Customized DenseNet-CAN | 0.4248 | 0.6385 | 0.7313 | 0.8036 |
62
- | Customized ResNet-CAN | 0.4511 | 0.6459 | 0.7288 | 0.7888 |
63
- | Pretrained ResNet-CAN | 0.424 | 0.622 | 0.7214 | 0.7888 |
64
- | Pretrained DenseNet-CAN | **0.5316** | **0,7149** | **0.8069** | **0.8521** |
65
-
66
- ## **Conclusion**
67
- CAN works well for handwritten math recognition on CROHME dataset. It handles complex expressions with counting and attention. Future ideas: try transformer decoders, add synthetic data, improve preprocessing for noisy images.
68
-
69
- ## **Installation**
70
- Clone the repository and naviagate to the project directory:
71
- ```bash
72
- git clone https://github.com/fisherman611/handwritten-mathematical-expression-recognition.git
73
- ```
74
-
75
- Navigate to the project directory:
76
- ```bash
77
- cd handwritten-mathematical-expression-recognition
78
- ```
79
-
80
- Install the required dependencies:
81
- ```bash
82
- pip install -r requirements.txt
83
- ```
84
-
85
- ## **Download the pretrained model**
86
- Download the pretrained model checkpoints from this [OneDrive link](https://husteduvn-my.sharepoint.com/:f:/g/personal/thanh_lh225458_sis_hust_edu_vn/EvWQqIjJQtNKuQwwH1G8EMkBcRPM8s3msiI7-IBERbve1A?e=6SeGHB)
87
-
88
- Place the downloaded checkpoint in the `checkpoints/` directory within the repository.
89
-
90
- ## **Inference**
91
- Run `app.py` to make the inference
92
- ```bash
93
- python app.py
94
- ```
95
- ## **References**
96
- [1] B. Li, Y. Yuan, D. Liang, X. Liu, Z. Ji, J. Bai, W. Liu, and X. Bai, "When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition," arXiv preprint arXiv:2207.11463, 2022. [Online]. Available: https://arxiv.org/abs/2207.11463
97
- ## **License**
98
- This project is licensed under the [MIT License](LICENSE).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Handwritten Mathematical Expression Recognition
3
+ emoji: 🐢
4
+ colorFrom: purple
5
+ colorTo: red
6
+ sdk: gradio
7
+ sdk_version: 5.38.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+
14
+ # **Handwritten Mathematical Expression Recognition**
15
+
16
+ ## **Project Overview**
17
+ This project focuses on recognizing handwritten mathematical expressions and converting them into LaTeX format. The system leverages deep learning techniques to process images of handwritten equations, interpret their structure, and generate corresponding LaTeX code. The primary goal is to achieve high accuracy in recognizing complex mathematical expressions, addressing challenges such as varying handwriting styles and intricate symbol arrangements. The project is built using PyTorch and incorporates advanced neural network architectures tailored for this task.
18
+
19
+ ## **Dataset**
20
+
21
+ The project utilizes the **CROHME (Competition on Recognition of Online Handwritten Mathematical Expressions)** dataset, a widely used benchmark for handwritten mathematical expression recognition. The dataset is organized into several subsets, each containing images and their corresponding LaTeX annotations.
22
+
23
+ Download the splitted dataset: [CROHME Splitted](https://husteduvn-my.sharepoint.com/:f:/g/personal/thanh_lh225458_sis_hust_edu_vn/EviH0ckuHR9KiXftU5ETkPQBHvEL77YTscIHvfN7LBSrSg?e=CHwNxv) and then place in the `data/` directory.
24
+
25
+ ## **Methods and Models**
26
+
27
+ ### **Preprocessing**
28
+ Steps to clean and standardize images:
29
+ * Load in grayscale.
30
+ * Use Canny edge detection, dilate with $7 \times 13$ kernel to connect edges.
31
+ * Crop with F1-score method to focus on the expression.
32
+ * Binarize with adaptive thresholding; set background to black if needed.
33
+ * Apply median blur (kernel 3) multiple times to reduce noise.
34
+ * Add 5-pixel padding, resize to $128 \times 384$, pad with black if needed.
35
+
36
+ ### **Augmentation**
37
+ Augmentation to handle handwriting variations:
38
+ * Rotate up to 5 degrees, border replication.
39
+ * Elastic transform for stroke variations.
40
+ * Random morphology: erode or dilate to change stroke thickness.
41
+ * Normalize and convert to tensor.
42
+
43
+ ### **Model: Counting-Aware Network (CAN)**
44
+
45
+ CAN is an end-to-end model for HMER, combining recognition and symbol counting:
46
+ * **Backbone:**
47
+
48
+ * DenseNet (or ResNet)
49
+ * Takes grayscale image $H' \times W' \times 1$, outputs feature map $\mathcal{F} \in \mathbb{R}^{H \times W \times 684}$, where ($H = \frac{H'}{16}$), ($W = \frac{W'}{16}$).
50
+
51
+ * **Multi-Scale Counting Module (MSCM):**
52
+
53
+ * Uses $3 \times 3$ and $5 \times 5$ conv branches for multi-scales features.
54
+ * Channel attention: $$\mathcal{Q} = \sigma(W_1(G(\mathcal{H})) + b_1)$$ $$\mathcal{S} = \mathcal{Q} \otimes g(W_2 \mathcal{Q} + b_2)$$
55
+ * Concatenates features, $1 \times 1$ conv to counting map $$\mathcal{M} \in \mathbb{R}^{H \times W \times C}$$
56
+ * Sum-pooling gives counting vector $$\mathcal{V}i = \sum{p=1}^H \sum_{q=1}^W \mathcal{M}_{i,pq}$$
57
+
58
+ * **Counting-Combined Attentional Decoder (CCAD):**
59
+
60
+ * $1 \times 1$ conv on $\mathcal{F}$ to $\mathcal{T} \in \mathbb{R}^{H \times W \times 512}$, adds positional encoding.
61
+ * GRU gives hidden state $h_t \in \mathbb{R}^{1 \times 256}$, attention weights: $$e_{t,ij} = w^T \tanh(\mathcal{T} + \mathcal{P} + W_a \mathcal{A} + W_h h_t) + b$$ $$\alpha_{t,ij} = \frac{\exp(e_{t,ij})}{\sum_{p=1}^H \sum_{q=1}^W \exp(e_{t,pq})}$$
62
+ * Context vector $\mathcal{C} \in \mathbb{R}^{1 \times 256}$, predicts token: $$p(y_t) = \text{softmax}(w_o^T (W_c \mathcal{C} + W_v \mathcal{V}^f + W_t h_t + W_e E(y_{t-1})) + b_o)$$
63
+ * Beam search (width = 5) for inference.
64
+
65
+ * **Loss:**
66
+
67
+ * Loss class: $$\mathcal{L}{\text{cls}} = -\frac{1}{T} \sum{t=1}^T \log(p(y_t))$$
68
+ * Loss counting: $$\mathcal{L}{\text{counting}} = \text{smooth}{L_1}(\mathcal{V}^f, \hat{\mathcal{V}})$$
69
+ * Total loss: $$\mathcal{L} = \mathcal{L}{\text{cls}} + \lambda \mathcal{L}{\text{counting}}$$, $$\lambda = 0.01$$
70
+
71
+ ## **Results**
72
+ | Model | ExpRate | ExpRate-Leq1 | ExpRate-Leq2 | ExpRate-Leq3 |
73
+ |-------------|--------|---------|---------|---------|
74
+ | Customized DenseNet-CAN | 0.4248 | 0.6385 | 0.7313 | 0.8036 |
75
+ | Customized ResNet-CAN | 0.4511 | 0.6459 | 0.7288 | 0.7888 |
76
+ | Pretrained ResNet-CAN | 0.424 | 0.622 | 0.7214 | 0.7888 |
77
+ | Pretrained DenseNet-CAN | **0.5316** | **0,7149** | **0.8069** | **0.8521** |
78
+
79
+ ## **Conclusion**
80
+ CAN works well for handwritten math recognition on CROHME dataset. It handles complex expressions with counting and attention. Future ideas: try transformer decoders, add synthetic data, improve preprocessing for noisy images.
81
+
82
+ ## **Installation**
83
+ Clone the repository and naviagate to the project directory:
84
+ ```bash
85
+ git clone https://github.com/fisherman611/handwritten-mathematical-expression-recognition.git
86
+ ```
87
+
88
+ Navigate to the project directory:
89
+ ```bash
90
+ cd handwritten-mathematical-expression-recognition
91
+ ```
92
+
93
+ Install the required dependencies:
94
+ ```bash
95
+ pip install -r requirements.txt
96
+ ```
97
+
98
+ ## **Download the pretrained model**
99
+ Download the pretrained model checkpoints from this [OneDrive link](https://husteduvn-my.sharepoint.com/:f:/g/personal/thanh_lh225458_sis_hust_edu_vn/EvWQqIjJQtNKuQwwH1G8EMkBcRPM8s3msiI7-IBERbve1A?e=6SeGHB)
100
+
101
+ Place the downloaded checkpoint in the `checkpoints/` directory within the repository.
102
+
103
+ ## **Inference**
104
+ Run `app.py` to make the inference
105
+ ```bash
106
+ python app.py
107
+ ```
108
+ ## **References**
109
+ [1] B. Li, Y. Yuan, D. Liang, X. Liu, Z. Ji, J. Bai, W. Liu, and X. Bai, "When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition," arXiv preprint arXiv:2207.11463, 2022. [Online]. Available: https://arxiv.org/abs/2207.11463
110
+ ## **License**
111
+ This project is licensed under the [MIT License](LICENSE).