Spaces:
Configuration error
Configuration error
| # Docling Serve | |
| Running [Docling](https://github.com/DS4SD/docling) as an API service. | |
| ## Usage | |
| The API provides two endpoints: one for urls, one for files. This is necessary to send files directly in binary format instead of base64-encoded strings. | |
| ### Common parameters | |
| On top of the source of file (see below), both endpoints support the same parameters, which are almost the same as the Docling CLI. | |
| - `from_format` (List[str]): Input format(s) to convert from. Allowed values: `docx`, `pptx`, `html`, `image`, `pdf`, `asciidoc`, `md`. Defaults to all formats. | |
| - `to_formats` (List[str]): Output format(s) to convert to. Allowed values: `md`, `json`, `html`, `text`, `doctags`. Defaults to `md`. | |
| - `do_ocr` (bool): If enabled, the bitmap content will be processed using OCR. Defaults to `True`. | |
| - `image_export_mode`: Image export mode for the document (only in case of JSON, Markdown or HTML). Allowed values: embedded, placeholder, referenced. Optional, defaults to `embedded`. | |
| - `force_ocr` (bool): If enabled, replace any existing text with OCR-generated text over the full content. Defaults to `False`. | |
| - `ocr_engine` (str): OCR engine to use. Allowed values: `easyocr`, `tesseract_cli`, `tesseract`, `rapidocr`, `ocrmac`. Defaults to `easyocr`. | |
| - `ocr_lang` (List[str]): List of languages used by the OCR engine. Note that each OCR engine has different values for the language names. Defaults to empty. | |
| - `pdf_backend` (str): PDF backend to use. Allowed values: `pypdfium2`, `dlparse_v1`, `dlparse_v2`. Defaults to `dlparse_v2`. | |
| - `table_mode` (str): Table mode to use. Allowed values: `fast`, `accurate`. Defaults to `fast`. | |
| - `abort_on_error` (bool): If enabled, abort on error. Defaults to false. | |
| - `return_as_file` (boo): If enabled, return the output as a file. Defaults to false. | |
| - `do_table_structure` (bool): If enabled, the table structure will be extracted. Defaults to true. | |
| - `include_images` (bool): If enabled, images will be extracted from the document. Defaults to true. | |
| - `images_scale` (float): Scale factor for images. Defaults to 2.0. | |
| ### URL endpoint | |
| The endpoint is `/v1alpha/convert/source`, listening for POST requests of JSON payloads. | |
| On top of the above parameters, you must send the URL(s) of the document you want process with either the `http_sources` or `file_sources` fields. | |
| The first is fetching URL(s) (optionally using with extra headers), the second allows to provide documents as base64-encoded strings. | |
| No `options` is required, they can be partially or completely omitted. | |
| Simple payload example: | |
| ```json | |
| { | |
| "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}] | |
| } | |
| ``` | |
| <details> | |
| <summary>Complete payload example:</summary> | |
| ```json | |
| { | |
| "options": { | |
| "from_formats": ["docx", "pptx", "html", "image", "pdf", "asciidoc", "md", "xlsx"], | |
| "to_formats": ["md", "json", "html", "text", "doctags"], | |
| "image_export_mode": "placeholder", | |
| "do_ocr": true, | |
| "force_ocr": false, | |
| "ocr_engine": "easyocr", | |
| "ocr_lang": ["en"], | |
| "pdf_backend": "dlparse_v2", | |
| "table_mode": "fast", | |
| "abort_on_error": false, | |
| "return_as_file": false, | |
| }, | |
| "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}] | |
| } | |
| ``` | |
| </details> | |
| <details> | |
| <summary>CURL example:</summary> | |
| ```sh | |
| curl -X 'POST' \ | |
| 'http://localhost:5001/v1alpha/convert/source' \ | |
| -H 'accept: application/json' \ | |
| -H 'Content-Type: application/json' \ | |
| -d '{ | |
| "options": { | |
| "from_formats": [ | |
| "docx", | |
| "pptx", | |
| "html", | |
| "image", | |
| "pdf", | |
| "asciidoc", | |
| "md", | |
| "xlsx" | |
| ], | |
| "to_formats": ["md", "json", "html", "text", "doctags"], | |
| "image_export_mode": "placeholder", | |
| "do_ocr": true, | |
| "force_ocr": false, | |
| "ocr_engine": "easyocr", | |
| "ocr_lang": [ | |
| "fr", | |
| "de", | |
| "es", | |
| "en" | |
| ], | |
| "pdf_backend": "dlparse_v2", | |
| "table_mode": "fast", | |
| "abort_on_error": false, | |
| "return_as_file": false, | |
| "do_table_structure": true, | |
| "include_images": true, | |
| "images_scale": 2, | |
| }, | |
| "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}] | |
| }' | |
| ``` | |
| </details> | |
| <details> | |
| <summary>Python example:</summary> | |
| ```python | |
| import httpx | |
| async_client = httpx.AsyncClient(timeout=60.0) | |
| url = "http://localhost:5001/v1alpha/convert/source" | |
| payload = { | |
| "options": { | |
| "from_formats": ["docx", "pptx", "html", "image", "pdf", "asciidoc", "md", "xlsx"], | |
| "to_formats": ["md", "json", "html", "text", "doctags"], | |
| "image_export_mode": "placeholder", | |
| "do_ocr": True, | |
| "force_ocr": False, | |
| "ocr_engine": "easyocr", | |
| "ocr_lang": "en", | |
| "pdf_backend": "dlparse_v2", | |
| "table_mode": "fast", | |
| "abort_on_error": False, | |
| "return_as_file": False, | |
| }, | |
| "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}] | |
| } | |
| response = await async_client_client.post(url, json=payload) | |
| data = response.json() | |
| ``` | |
| </details> | |
| #### File as base64 | |
| The `file_sources` argument in the endpoint allows to send files as base64-encoded strings. | |
| When your PDF or other file type is too large, encoding it and passing it inline to curl | |
| can lead to an โArgument list too longโ error on some systems. To avoid this, we write | |
| the JSON request body to a file and have curl read from that file. | |
| <details> | |
| <summary>CURL steps:</summary> | |
| ```sh | |
| # 1. Base64-encode the file | |
| B64_DATA=$(base64 -w 0 /path/to/file/pdf-to-convert.pdf) | |
| # 2. Build the JSON with your options | |
| cat <<EOF > /tmp/request_body.json | |
| { | |
| "options": { | |
| }, | |
| "file_sources": [{ | |
| "base64_string": "${B64_DATA}", | |
| "filename": "pdf-to-convert.pdf" | |
| }] | |
| } | |
| EOF | |
| # 3. POST the request to the docling service | |
| curl -X POST "localhost:5001/v1alpha/convert/source" \ | |
| -H "Content-Type: application/json" \ | |
| -d @/tmp/request_body.json | |
| ``` | |
| </details> | |
| ### File endpoint | |
| The endpoint is: `/v1alpha/convert/file`, listening for POST requests of Form payloads (necessary as the files are sent as multipart/form data). You can send one or multiple files. | |
| <details> | |
| <summary>CURL example:</summary> | |
| ```sh | |
| curl -X 'POST' \ | |
| 'http://127.0.0.1:5001/v1alpha/convert/file' \ | |
| -H 'accept: application/json' \ | |
| -H 'Content-Type: multipart/form-data' \ | |
| -F 'ocr_engine=easyocr' \ | |
| -F 'pdf_backend=dlparse_v2' \ | |
| -F 'from_formats=pdf' \ | |
| -F 'from_formats=docx' \ | |
| -F 'force_ocr=false' \ | |
| -F 'image_export_mode=embedded' \ | |
| -F 'ocr_lang=en' \ | |
| -F 'ocr_lang=pl' \ | |
| -F 'table_mode=fast' \ | |
| -F '[email protected];type=application/pdf' \ | |
| -F 'abort_on_error=false' \ | |
| -F 'to_formats=md' \ | |
| -F 'to_formats=text' \ | |
| -F 'return_as_file=false' \ | |
| -F 'do_ocr=true' | |
| ``` | |
| </details> | |
| <details> | |
| <summary>Python example:</summary> | |
| ```python | |
| import httpx | |
| async_client = httpx.AsyncClient(timeout=60.0) | |
| url = "http://localhost:5001/v1alpha/convert/file" | |
| parameters = { | |
| "from_formats": ["docx", "pptx", "html", "image", "pdf", "asciidoc", "md", "xlsx"], | |
| "to_formats": ["md", "json", "html", "text", "doctags"], | |
| "image_export_mode": "placeholder", | |
| "do_ocr": True, | |
| "force_ocr": False, | |
| "ocr_engine": "easyocr", | |
| "ocr_lang": ["en"], | |
| "pdf_backend": "dlparse_v2", | |
| "table_mode": "fast", | |
| "abort_on_error": False, | |
| "return_as_file": False | |
| } | |
| current_dir = os.path.dirname(__file__) | |
| file_path = os.path.join(current_dir, '2206.01062v1.pdf') | |
| files = { | |
| 'files': ('2206.01062v1.pdf', open(file_path, 'rb'), 'application/pdf'), | |
| } | |
| response = await async_client.post(url, files=files, data={"parameters": json.dumps(parameters)}) | |
| assert response.status_code == 200, "Response should be 200 OK" | |
| data = response.json() | |
| ``` | |
| </details> | |
| ### Response format | |
| The response can be a JSON Document or a File. | |
| - If you process only one file, the response will be a JSON document with the following format: | |
| ```jsonc | |
| { | |
| "document": { | |
| "md_content": "", | |
| "json_content": {}, | |
| "html_content": "", | |
| "text_content": "", | |
| "doctags_content": "" | |
| }, | |
| "status": "<success|partial_success|skipped|failure>", | |
| "processing_time": 0.0, | |
| "timings": {}, | |
| "errors": [] | |
| } | |
| ``` | |
| Depending on the value you set in `output_formats`, the different items will be populated with their respective results or empty. | |
| `processing_time` is the Docling processing time in seconds, and `timings` (when enabled in the backend) provides the detailed | |
| timing of all the internal Docling components. | |
| - If you set the parameter `return_as_file` to True, the response will be a zip file. | |
| - If multiple files are generated (multiple inputs, or one input but multiple outputs with `return_as_file` True), the response will be a zip file. | |
| ## Helpers | |
| - A full Swagger UI is available at the `/docs` endpoint. | |
|  | |
| - An easy to use UI is available at the `/ui` endpoint. | |
|  | |
|  | |
| ## Development | |
| ### CPU only | |
| ```sh | |
| # Install uv if not already available | |
| curl -LsSf https://astral.sh/uv/install.sh | sh | |
| # Install dependencies | |
| uv sync --extra cpu | |
| ``` | |
| ### Cuda GPU | |
| For GPU support use the following command: | |
| ```sh | |
| # Install dependencies | |
| uv sync | |
| ``` | |
| ### Gradio UI and different OCR backends | |
| `/ui` endpoint using `gradio` and different OCR backends can be enabled via package extras: | |
| ```sh | |
| # Enable ui and rapidocr | |
| uv sync --extra ui --extra rapidocr | |
| ``` | |
| ```sh | |
| # Enable tesserocr | |
| uv sync --extra tesserocr | |
| ``` | |
| See `[project.optional-dependencies]` section in `pyproject.toml` for full list of options. | |
| ### Run the server | |
| The `docling-serve` executable is a convenient script for launching the webserver both in | |
| development and production mode. | |
| ```sh | |
| # Run the server in development mode | |
| # - reload is enabled by default | |
| # - listening on the 127.0.0.1 address | |
| # - ui is enabled by default | |
| docling-serve dev | |
| # Run the server in production mode | |
| # - reload is disabled by default | |
| # - listening on the 0.0.0.0 address | |
| # - ui is disabled by default | |
| docling-serve run | |
| ``` | |
| ### Options | |
| The `docling-serve` executable allows is controlled with both command line | |
| options and environment variables. | |
| <details> | |
| <summary>`docling-serve` help message</summary> | |
| ```sh | |
| $ docling-serve dev --help | |
| Usage: docling-serve dev [OPTIONS] | |
| Run a Docling Serve app in development mode. ๐งช | |
| This is equivalent to docling-serve run but with reload | |
| enabled and listening on the 127.0.0.1 address. | |
| Options can be set also with the corresponding ENV variable, with the exception | |
| of --enable-ui, --host and --reload. | |
| โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ | |
| โ --host TEXT The host to serve on. For local development in localhost โ | |
| โ use 127.0.0.1. To enable public access, e.g. in a โ | |
| โ container, use all the IP addresses available with โ | |
| โ 0.0.0.0. โ | |
| โ [default: 127.0.0.1] โ | |
| โ --port INTEGER The port to serve on. [default: 5001] โ | |
| โ --reload --no-reload Enable auto-reload of the server when (code) files โ | |
| โ change. This is resource intensive, use it only during โ | |
| โ development. โ | |
| โ [default: reload] โ | |
| โ --root-path TEXT The root path is used to tell your app that it is being โ | |
| โ served to the outside world with some path prefix set up โ | |
| โ in some termination proxy or similar. โ | |
| โ --proxy-headers --no-proxy-headers Enable/Disable X-Forwarded-Proto, X-Forwarded-For, โ | |
| โ X-Forwarded-Port to populate remote address info. โ | |
| โ [default: proxy-headers] โ | |
| โ --artifacts-path PATH If set to a valid directory, the model weights will be โ | |
| โ loaded from this path. โ | |
| โ [default: None] โ | |
| โ --enable-ui --no-enable-ui Enable the development UI. [default: enable-ui] โ | |
| โ --help Show this message and exit. โ | |
| โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ | |
| ``` | |
| </details> | |
| #### Environment variables | |
| The environment variables controlling the `uvicorn` execution can be specified with the `UVICORN_` prefix: | |
| - `UVICORN_WORKERS`: Number of workers to use. | |
| - `UVICORN_RELOAD`: If `True`, this will enable auto-reload when you modify files, useful for development. | |
| The environment variables controlling specifics of the Docling Serve app can be specified with the | |
| `DOCLING_SERVE_` prefix: | |
| - `DOCLING_SERVE_ARTIFACTS_PATH`: if set Docling will use only the local weights of models, for example `/opt/app-root/src/.cache/docling/models`. | |
| - `DOCLING_SERVE_ENABLE_UI`: If `True`, The Gradio UI will be available at `/ui`. | |
| Others: | |
| - `TESSDATA_PREFIX`: Tesseract data location, example `/usr/share/tesseract/tessdata/`. | |
| ## Get help and support | |
| Please feel free to connect with us using the [discussion section](https://github.com/DS4SD/docling/discussions). | |
| ## Contributing | |
| Please read [Contributing to Docling Serve](https://github.com/DS4SD/docling-serve/blob/main/CONTRIBUTING.md) for details. | |
| ## References | |
| If you use Docling in your projects, please consider citing the following: | |
| ```bib | |
| @techreport{Docling, | |
| author = {Deep Search Team}, | |
| month = {8}, | |
| title = {Docling Technical Report}, | |
| url = {https://arxiv.org/abs/2408.09869}, | |
| eprint = {2408.09869}, | |
| doi = {10.48550/arXiv.2408.09869}, | |
| version = {1.0.0}, | |
| year = {2024} | |
| } | |
| ``` | |
| ## License | |
| The Docling Serve codebase is under MIT license. | |
| ## IBM โค๏ธ Open Source AI | |
| Docling has been brought to you by IBM. | |