Spaces:
Sleeping
Sleeping
title: Pdf2markdown (Flask) | |
emoji: ๐๏ธ | |
colorFrom: green | |
colorTo: blue | |
sdk: docker | |
pinned: false | |
# For Docker Spaces, app_port in README.md informs Hugging Face which internal port your app listens on. | |
# This should match the port Gunicorn (or your app server) binds to. | |
app_port: 7860 | |
## PDF to Markdown Converter (Flask Version) | |
This application converts PDF files (either uploaded or from a URL) into Markdown format. | |
It extracts text, attempts to format it, identifies tables, and extracts images. | |
Extracted images are uploaded to a Hugging Face Dataset repository named "pdf-images-extracted" (this can be configured). | |
**Important:** For image uploading to work, you **must** set an `HF_TOKEN` with write access to datasets in your Hugging Face Space secrets. | |
### Features | |
- Upload PDF files directly. | |
- Process PDFs from a publicly accessible URL. | |
- Extracts plain text and attempts to preserve some layout. | |
- Detects and formats tables into Markdown. | |
- Extracts images from the PDF. | |
- Performs OCR on extracted images to include text from images. | |
- Uploads extracted images to a Hugging Face Dataset. | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |