Spaces:
Build error
A newer version of the Streamlit SDK is available:
1.45.1
PDF2Markdown
Demo:(left: input image; right: rendered markdown.)
Extract PDF features by these tasks:
Layout Detection: Using the YOLOv8 model for region detection, such as images, tables, titles, text, etc.;
Formula Detection: Using YOLOv8 for detecting formulas, including inline formulas and isolated formulas;
Formula Recognition: Using UniMERNet for formula recognition;
Table Recognition: Using StructEqTable for table recognition;
Optical Character Recognition: Using PaddleOCR for text recognition;
Convert features to markdown file:
Using simple rules to convert the identified result to markdown (Note: this is a simply convert code and can only support one-column PDFs, see MinerU for more complex situation).
Usage
python project/pdf2markdown/scripts/run_project.py --config project/pdf2markdown/configs/pdf2markdown.yaml