File size: 5,777 Bytes
8313ba2 862803b 8313ba2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
---
title: Handwritten Mathematical Expression Recognition
emoji: 🔢
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: mit
---
# **Handwritten Mathematical Expression Recognition**
## **Project Overview**
This project focuses on recognizing handwritten mathematical expressions and converting them into LaTeX format. The system leverages deep learning techniques to process images of handwritten equations, interpret their structure, and generate corresponding LaTeX code. The primary goal is to achieve high accuracy in recognizing complex mathematical expressions, addressing challenges such as varying handwriting styles and intricate symbol arrangements. The project is built using PyTorch and incorporates advanced neural network architectures tailored for this task.
## **Dataset**
The project utilizes the **CROHME (Competition on Recognition of Online Handwritten Mathematical Expressions)** dataset, a widely used benchmark for handwritten mathematical expression recognition. The dataset is organized into several subsets, each containing images and their corresponding LaTeX annotations.
Download the splitted dataset: [CROHME Splitted](https://husteduvn-my.sharepoint.com/:f:/g/personal/thanh_lh225458_sis_hust_edu_vn/EviH0ckuHR9KiXftU5ETkPQBHvEL77YTscIHvfN7LBSrSg?e=CHwNxv) and then place in the `data/` directory.
## **Methods and Models**
### **Preprocessing**
Steps to clean and standardize images:
* Load in grayscale.
* Use Canny edge detection, dilate with $7 \times 13$ kernel to connect edges.
* Crop with F1-score method to focus on the expression.
* Binarize with adaptive thresholding; set background to black if needed.
* Apply median blur (kernel 3) multiple times to reduce noise.
* Add 5-pixel padding, resize to $128 \times 384$, pad with black if needed.
### **Augmentation**
Augmentation to handle handwriting variations:
* Rotate up to 5 degrees, border replication.
* Elastic transform for stroke variations.
* Random morphology: erode or dilate to change stroke thickness.
* Normalize and convert to tensor.
### **Model: Counting-Aware Network (CAN)**
CAN is an end-to-end model for HMER, combining recognition and symbol counting:
* **Backbone:**
* DenseNet (or ResNet)
* Takes grayscale image $H' \times W' \times 1$, outputs feature map $\mathcal{F} \in \mathbb{R}^{H \times W \times 684}$, where ($H = \frac{H'}{16}$), ($W = \frac{W'}{16}$).
* **Multi-Scale Counting Module (MSCM):**
* Uses $3 \times 3$ and $5 \times 5$ conv branches for multi-scales features.
* Channel attention: $$\mathcal{Q} = \sigma(W_1(G(\mathcal{H})) + b_1)$$ $$\mathcal{S} = \mathcal{Q} \otimes g(W_2 \mathcal{Q} + b_2)$$
* Concatenates features, $1 \times 1$ conv to counting map $$\mathcal{M} \in \mathbb{R}^{H \times W \times C}$$
* Sum-pooling gives counting vector $$\mathcal{V}i = \sum{p=1}^H \sum_{q=1}^W \mathcal{M}_{i,pq}$$
* **Counting-Combined Attentional Decoder (CCAD):**
* $1 \times 1$ conv on $\mathcal{F}$ to $\mathcal{T} \in \mathbb{R}^{H \times W \times 512}$, adds positional encoding.
* GRU gives hidden state $h_t \in \mathbb{R}^{1 \times 256}$, attention weights: $$e_{t,ij} = w^T \tanh(\mathcal{T} + \mathcal{P} + W_a \mathcal{A} + W_h h_t) + b$$ $$\alpha_{t,ij} = \frac{\exp(e_{t,ij})}{\sum_{p=1}^H \sum_{q=1}^W \exp(e_{t,pq})}$$
* Context vector $\mathcal{C} \in \mathbb{R}^{1 \times 256}$, predicts token: $$p(y_t) = \text{softmax}(w_o^T (W_c \mathcal{C} + W_v \mathcal{V}^f + W_t h_t + W_e E(y_{t-1})) + b_o)$$
* Beam search (width = 5) for inference.
* **Loss:**
* Loss class: $$\mathcal{L}{\text{cls}} = -\frac{1}{T} \sum{t=1}^T \log(p(y_t))$$
* Loss counting: $$\mathcal{L}{\text{counting}} = \text{smooth}{L_1}(\mathcal{V}^f, \hat{\mathcal{V}})$$
* Total loss: $$\mathcal{L} = \mathcal{L}{\text{cls}} + \lambda \mathcal{L}{\text{counting}}$$, $$\lambda = 0.01$$
## **Results**
| Model | ExpRate | ExpRate-Leq1 | ExpRate-Leq2 | ExpRate-Leq3 |
|-------------|--------|---------|---------|---------|
| Customized DenseNet-CAN | 0.4248 | 0.6385 | 0.7313 | 0.8036 |
| Customized ResNet-CAN | 0.4511 | 0.6459 | 0.7288 | 0.7888 |
| Pretrained ResNet-CAN | 0.424 | 0.622 | 0.7214 | 0.7888 |
| Pretrained DenseNet-CAN | **0.5316** | **0,7149** | **0.8069** | **0.8521** |
## **Conclusion**
CAN works well for handwritten math recognition on CROHME dataset. It handles complex expressions with counting and attention. Future ideas: try transformer decoders, add synthetic data, improve preprocessing for noisy images.
## **Installation**
Clone the repository and naviagate to the project directory:
```bash
git clone https://github.com/fisherman611/handwritten-mathematical-expression-recognition.git
```
Navigate to the project directory:
```bash
cd handwritten-mathematical-expression-recognition
```
Install the required dependencies:
```bash
pip install -r requirements.txt
```
## **Download the pretrained model**
Download the pretrained model checkpoints from this [OneDrive link](https://husteduvn-my.sharepoint.com/:f:/g/personal/thanh_lh225458_sis_hust_edu_vn/EvWQqIjJQtNKuQwwH1G8EMkBcRPM8s3msiI7-IBERbve1A?e=6SeGHB)
Place the downloaded checkpoint in the `checkpoints/` directory within the repository.
## **Inference**
Run `app.py` to make the inference
```bash
python app.py
```
## **References**
[1] B. Li, Y. Yuan, D. Liang, X. Liu, Z. Ji, J. Bai, W. Liu, and X. Bai, "When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition," arXiv preprint arXiv:2207.11463, 2022. [Online]. Available: https://arxiv.org/abs/2207.11463
## **License**
This project is licensed under the [MIT License](LICENSE).
|