leonarb commited on
Commit
5e55b20
Β·
verified Β·
1 Parent(s): 3dc75f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -16
README.md CHANGED
@@ -1,25 +1,28 @@
1
  ---
2
- title: Olmocr Demo
3
- emoji: 😻
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- sdk_version: 5.29.0
8
  app_file: app.py
9
- pinned: false
 
10
  ---
11
 
12
- # PDF to EPUB Converter (olmOCR)
13
 
14
- This Gradio app converts a PDF into a clean EPUB using the [olmOCR](https://huggingface.co/allenai/olmOCR-7B-0225-preview) model. Each PDF page is rendered and processed through OCR, with the first page used as the cover. Metadata (title, author, language) can be entered manually.
15
 
16
- ## Features
 
 
 
17
 
18
- - OCR via `olmOCR-7B-0225-preview`
19
- - First page used as EPUB cover
20
- - Input for title, author, and language
21
- - EPUB output for ebook readers
22
 
23
- ## Requirements
24
 
25
- Already defined in `requirements.txt`:
 
 
 
1
  ---
2
+ title: olmOCR Markdown Converter
3
+ emoji: πŸ“
4
+ colorFrom: yellow
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 3.50.2
8
  app_file: app.py
9
+ python_version: 3.11
10
+ license: mit
11
  ---
12
 
13
+ # olmOCR Markdown Converter
14
 
15
+ This Space uses the `olmOCR` model pipeline to convert PDFs (including scientific papers) into markdown `.txt` files that retain document structure, headers, and basic math formatting β€” ready for Calibre/Kindle or downstream parsing.
16
 
17
+ - βœ… Vision + text anchor OCR pipeline (via `olmOCR`)
18
+ - βœ… Extracts semantic structure via PDF TOC
19
+ - βœ… Outputs clean `.txt` in markdown format
20
+ - βœ… Hugging Face **Gradio Space with GPU support**
21
 
22
+ ## Example Use
 
 
 
23
 
24
+ Upload a scientific paper in PDF and download a markdown `.txt` version with preserved headers and inline structure.
25
 
26
+ ---
27
+
28
+ Built by [@BenedictRichardLeonardi](https://huggingface.co/BenedictRichardLeonardi) using [olmOCR](https://huggingface.co/allenai/olmOCR-7B-0225-preview)