image2drawio / README.md
Giustino98's picture
fixed readme
9271fec
---
title: Image2drawio
emoji: 🐒
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 5.33.1
app_file: app.py
pinned: false
license: mit
short_description: A multi-agent application that converts images into drawio
tags:
- agent-demo-track
- mcp-server-track
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Image to Draw.io Converter
A multi-agent application that converts images into editable [Draw.io](https://app.diagrams.net/) diagrams using advanced LLM-based object detection and diagram generation. The system is built with [LangGraph](https://github.com/langchain-ai/langgraph) and features a modern [Gradio](https://gradio.app/) web interface, which is also exposed as an MCP server for integration with external clients.
Examples outcome added in "example-outcome.png"
---
## Features
- **Multi-Agent Pipeline (LangGraph):**
- **Supervisor Agent:** Orchestrates the workflow and coordinates the agents.
- **Object Detection Agent (React):** Uses an LLM to detect and extract objects from the uploaded image.
- **Draw.io Generator Agent (React):** Converts detected objects into Draw.io XML diagrams.
- **Automated Workflow:**
1. Upload an image via the Gradio interface.
2. The object detection agent identifies and extracts key objects.
3. The diagram generator agent creates a Draw.io XML diagram.
4. Download the `.drawio` file or preview the SVG directly in the browser.
- **Modern Gradio UI:**
- Simple drag-and-drop image upload.
- Real-time status updates and diagram preview.
- Downloadable Draw.io file and copyable XML.
- SVG preview of the generated diagram.
- Exposed as an MCP server for programmatic access.
---
## Requirements
- Python 3.10+
- Install dependencies:
```sh
pip install -r requirements.txt
```
- API keys for your LLM provider (e.g., Google Gemini).
- Configure your `.env` file:
```
GEMINI_API_KEY=your_gemini_api_key
GEMINI_MODEL_NAME=gemini-2.5-pro-preview-06-05
GEMINI_THINKING_BUDGET=128
```
---
## Usage
1. **Install dependencies:**
```sh
pip install -r requirements.txt
```
2. **Configure environment:**
- Create a `.env` file in the project root with your API keys and model settings.
3. **Start the application:**
```sh
python app.py
```
4. **Access the web interface:**
- Open the provided local URL in your browser.
- Upload an image (diagram, sketch, chart, etc.).
- Click "Generate Diagram".
- Download the `.drawio` file or copy the XML.
- Open the file in [diagrams.net](https://app.diagrams.net/) for further editing.
---
## MCP Server Integration
The Gradio app is also exposed as an MCP server (`mcp_server=True`), allowing integration with any MCP-compatible client or workflow. This enables automated or remote usage in larger pipelines.
---
## Project Structure
- `app.py` β€” Main entry point, Gradio UI, and workflow orchestration.
- `nodes/` β€” Agent and LangGraph node definitions.
- `tools/` β€” LLM tools for object detection and Draw.io generation.
- `output_llm/` β€” Generated Draw.io files, SVG previews, and logs.
- `files/` β€” Uploaded user images.
---
## Notes
- The LLM model and thinking budget are fully configurable via `.env` for maximum flexibility.
- The application is designed for easy extension with new agents or tools.
- Performances can vary a lot based on the LLM: in my experience, Gemini 2.5 pro performed much better than Gemini 2.5 flash.
---
## License
MIT License
---
**Author:** Giustino Esposito