Spaces:
Running
Running
# Smoldocling CLI | |
A command-line interface for processing document images and PDFs using Smoldocling-256-preview model. | |
## Installation | |
1. Clone this repository | |
2. Install the required dependencies: | |
```bash | |
pip install -r requirements.txt | |
``` | |
## Usage | |
The CLI supports processing one or multiple document images and PDFs at once. The processed output will be saved as HTML files. | |
Basic usage: | |
```bash | |
python smoldocling_cli.py input1.png input2.jpg input3.pdf | |
``` | |
Specify output directory: | |
```bash | |
python smoldocling_cli.py -o custom_output input1.png document.pdf | |
``` | |
### Arguments | |
- `input_files`: One or more input files (images or PDFs) to process | |
- `-o, --output-dir`: Output directory for HTML files (default: 'output') | |
### Example | |
```bash | |
python smoldocling_cli.py document1.png document2.pdf -o processed_docs | |
``` | |
This will: | |
1. Process document1.png and generate document1.html | |
2. Process document2.pdf and generate document2.html (containing all pages in a single file) | |
3. Create a directory called 'processed_docs' if it doesn't exist | |
4. Save all HTML files in the processed_docs directory | |
## Notes | |
- The script will automatically create the output directory if it doesn't exist | |
- Each input image file will generate a corresponding HTML file with the same name (but .html extension) | |
- PDF files will generate a single HTML file containing all pages | |
- Currently, PDF processing is limited to the first 3 pages due to model limitations | |
- Failed processing of one file won't stop the processing of other files | |
- Error messages will be printed to stderr | |
- The model is loaded only once for processing multiple files | |