Spaces:

yhshin
/

latex-ocr

Runtime error

App Files Files Community

latex-ocr / article.md

Young Ho Shin

Add examples and article.md

36bccd1 about 3 years ago

preview code

raw

history blame

2.45 kB

	## What's the point of this?

	LaTeX is the de-facto standard markup language for typesetting pretty equations in academic papers.
	It is extremely feature rich and flexible but very verbose.
	This makes it great for typesetting complex equations, but not very convenient for quick note-taking on the fly.

	For example, here's a short equation from [this page](https://en.wikipedia.org/wiki/Quantum_electrodynamics) on Wikipedia about Quantum Electrodynamics
	and the corresponding LaTeX code:

	![Example]( https://wikimedia.org/api/rest_v1/media/math/render/svg/6faab1adbb88a567a52e55b2012e836a011a0675 )

	```
	{\displaystyle {\mathcal {L}}={\bar {\psi }}(i\gamma ^{\mu }D_{\mu }-m)\psi -{\frac {1}{4}}F_{\mu \nu }F^{\mu \nu },}
	```


	This demo is a first step in solving that problem.
	Eventually, you'll be able to take a quick screenshot of an equation from a paper
	and a program built with this model will generate its corresponding LaTeX source code
	so that you can just copy/paste straight into your personal notes.
	No more endless googling obscure LaTeX syntax!

	## How does it work?

	Because this problem involves looking at an image and generating valid LaTeX code,
	the model needs to understand both Computer Vision (CV) and Natural Language Processing (NLP).
	There are some other projects that aim to solve the same problem with some very interesting architectures
	that generally involve some kind of "encoder" that looks at the image and extracts and encodes the information about the equation from the image,
	and a "decoder" that takes that information and translates it into what is hopefully both valid and accurate LaTeX code.

	Examples:
	...

	I chose to tackle this problem with transfer learning.
	The biggest reason for this is computing constraints -
	I don't have unlimited access to GPU hours and wanted training to be reasonably fast, on the order of a couple of hours.
	There are some other benefits to this approach,
	e.g. the architecture is already proven to be robust enough for various applications, so less time spent on trial and error.

	I chose TrOCR, an OCR machine learning model trained by Microsoft on SRIOE data to produce text from receipts.

	<p style='text-align: center'>Made by Young Ho Shin</p>
	<p style='text-align: center'>
	<a href = "mailto: [email protected]">Email</a> \|
	<a href='https://www.github.com/yhshin11'>Github</a> \|
	<a href='https://www.linkedin.com/in/young-ho-shin-3995051b9/'>Linkedin</a>

	</p>