Spaces:
				
			
			
	
			
			
		Sleeping
		
	
	
	
			
			
	
	
	
	
		
		
		Sleeping
		
	Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -1,25 +1,28 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
            -
            title:  | 
| 3 | 
            -
            emoji:  | 
| 4 | 
            -
            colorFrom:  | 
| 5 | 
            -
            colorTo:  | 
| 6 | 
            -
            sdk:  | 
| 7 | 
            -
            sdk_version:  | 
| 8 | 
             
            app_file: app.py
         | 
| 9 | 
            -
             | 
|  | |
| 10 | 
             
            ---
         | 
| 11 |  | 
| 12 | 
            -
            #  | 
| 13 |  | 
| 14 | 
            -
            This  | 
| 15 |  | 
| 16 | 
            -
             | 
|  | |
|  | |
|  | |
| 17 |  | 
| 18 | 
            -
             | 
| 19 | 
            -
            - First page used as EPUB cover
         | 
| 20 | 
            -
            - Input for title, author, and language
         | 
| 21 | 
            -
            - EPUB output for ebook readers
         | 
| 22 |  | 
| 23 | 
            -
             | 
| 24 |  | 
| 25 | 
            -
             | 
|  | |
|  | 
|  | |
| 1 | 
             
            ---
         | 
| 2 | 
            +
            title: olmOCR Markdown Converter
         | 
| 3 | 
            +
            emoji: π
         | 
| 4 | 
            +
            colorFrom: yellow
         | 
| 5 | 
            +
            colorTo: blue
         | 
| 6 | 
            +
            sdk: gradio
         | 
| 7 | 
            +
            sdk_version: 3.50.2
         | 
| 8 | 
             
            app_file: app.py
         | 
| 9 | 
            +
            python_version: 3.11
         | 
| 10 | 
            +
            license: mit
         | 
| 11 | 
             
            ---
         | 
| 12 |  | 
| 13 | 
            +
            # olmOCR Markdown Converter
         | 
| 14 |  | 
| 15 | 
            +
            This Space uses the `olmOCR` model pipeline to convert PDFs (including scientific papers) into markdown `.txt` files that retain document structure, headers, and basic math formatting β ready for Calibre/Kindle or downstream parsing.
         | 
| 16 |  | 
| 17 | 
            +
            - β
 Vision + text anchor OCR pipeline (via `olmOCR`)
         | 
| 18 | 
            +
            - β
 Extracts semantic structure via PDF TOC
         | 
| 19 | 
            +
            - β
 Outputs clean `.txt` in markdown format
         | 
| 20 | 
            +
            - β
 Hugging Face **Gradio Space with GPU support**
         | 
| 21 |  | 
| 22 | 
            +
            ## Example Use
         | 
|  | |
|  | |
|  | |
| 23 |  | 
| 24 | 
            +
            Upload a scientific paper in PDF and download a markdown `.txt` version with preserved headers and inline structure.
         | 
| 25 |  | 
| 26 | 
            +
            ---
         | 
| 27 | 
            +
             | 
| 28 | 
            +
            Built by [@BenedictRichardLeonardi](https://huggingface.co/BenedictRichardLeonardi) using [olmOCR](https://huggingface.co/allenai/olmOCR-7B-0225-preview)
         | 
