Spaces:
Runtime error
Runtime error
Commit
Β·
b038f16
1
Parent(s):
e4017a7
updated README
Browse files
README.md
CHANGED
@@ -9,6 +9,108 @@ app_file: app.py
|
|
9 |
pinned: false
|
10 |
license: mit
|
11 |
short_description: A multi-agent application that converts images into drawio
|
|
|
12 |
---
|
13 |
|
14 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
pinned: false
|
10 |
license: mit
|
11 |
short_description: A multi-agent application that converts images into drawio
|
12 |
+
tag: agent-demo-track,mcp-server-track
|
13 |
---
|
14 |
|
15 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
16 |
+
|
17 |
+
# Image to Draw.io Converter
|
18 |
+
|
19 |
+
A multi-agent application that converts images into editable [Draw.io](https://app.diagrams.net/) diagrams using advanced LLM-based object detection and diagram generation. The system is built with [LangGraph](https://github.com/langchain-ai/langgraph) and features a modern [Gradio](https://gradio.app/) web interface, which is also exposed as an MCP server for integration with external clients.
|
20 |
+
|
21 |
+
---
|
22 |
+
|
23 |
+
## Features
|
24 |
+
|
25 |
+
- **Multi-Agent Pipeline (LangGraph):**
|
26 |
+
- **Supervisor Agent:** Orchestrates the workflow and coordinates the agents.
|
27 |
+
- **Object Detection Agent (React):** Uses an LLM to detect and extract objects from the uploaded image.
|
28 |
+
- **Draw.io Generator Agent (React):** Converts detected objects into Draw.io XML diagrams.
|
29 |
+
|
30 |
+
- **Automated Workflow:**
|
31 |
+
1. Upload an image via the Gradio interface.
|
32 |
+
2. The object detection agent identifies and extracts key objects.
|
33 |
+
3. The diagram generator agent creates a Draw.io XML diagram.
|
34 |
+
4. Download the `.drawio` file or preview the SVG directly in the browser.
|
35 |
+
|
36 |
+
- **Modern Gradio UI:**
|
37 |
+
- Simple drag-and-drop image upload.
|
38 |
+
- Real-time status updates and diagram preview.
|
39 |
+
- Downloadable Draw.io file and copyable XML.
|
40 |
+
- SVG preview of the generated diagram.
|
41 |
+
- Exposed as an MCP server for programmatic access.
|
42 |
+
|
43 |
+
---
|
44 |
+
|
45 |
+
## Requirements
|
46 |
+
|
47 |
+
- Python 3.10+
|
48 |
+
- Install dependencies:
|
49 |
+
```sh
|
50 |
+
pip install -r requirements.txt
|
51 |
+
```
|
52 |
+
- API keys for your LLM provider (e.g., Google Gemini).
|
53 |
+
- Configure your `.env` file:
|
54 |
+
```
|
55 |
+
GEMINI_API_KEY=your_gemini_api_key
|
56 |
+
GEMINI_MODEL_NAME=gemini-2.5-pro-preview-06-05
|
57 |
+
GEMINI_THINKING_BUDGET=128
|
58 |
+
```
|
59 |
+
|
60 |
+
---
|
61 |
+
|
62 |
+
## Usage
|
63 |
+
|
64 |
+
1. **Install dependencies:**
|
65 |
+
```sh
|
66 |
+
pip install -r requirements.txt
|
67 |
+
```
|
68 |
+
|
69 |
+
2. **Configure environment:**
|
70 |
+
- Create a `.env` file in the project root with your API keys and model settings.
|
71 |
+
|
72 |
+
3. **Start the application:**
|
73 |
+
```sh
|
74 |
+
python app.py
|
75 |
+
```
|
76 |
+
|
77 |
+
4. **Access the web interface:**
|
78 |
+
- Open the provided local URL in your browser.
|
79 |
+
- Upload an image (diagram, sketch, chart, etc.).
|
80 |
+
- Click "Generate Diagram".
|
81 |
+
- Download the `.drawio` file or copy the XML.
|
82 |
+
- Open the file in [diagrams.net](https://app.diagrams.net/) for further editing.
|
83 |
+
|
84 |
+
---
|
85 |
+
|
86 |
+
## MCP Server Integration
|
87 |
+
|
88 |
+
The Gradio app is also exposed as an MCP server (`mcp_server=True`), allowing integration with any MCP-compatible client or workflow. This enables automated or remote usage in larger pipelines.
|
89 |
+
|
90 |
+
---
|
91 |
+
|
92 |
+
## Project Structure
|
93 |
+
|
94 |
+
- `app.py` β Main entry point, Gradio UI, and workflow orchestration.
|
95 |
+
- `nodes/` β Agent and LangGraph node definitions.
|
96 |
+
- `tools/` β LLM tools for object detection and Draw.io generation.
|
97 |
+
- `output_llm/` β Generated Draw.io files, SVG previews, and logs.
|
98 |
+
- `files/` β Uploaded user images.
|
99 |
+
|
100 |
+
---
|
101 |
+
|
102 |
+
## Notes
|
103 |
+
|
104 |
+
- The LLM model and thinking budget are fully configurable via `.env` for maximum flexibility.
|
105 |
+
- The application is designed for easy extension with new agents or tools.
|
106 |
+
- Performances can vary a lot based on the LLM: in my experience, Gemini 2.5 pro performed much better than Gemini 2.5 flash.
|
107 |
+
|
108 |
+
---
|
109 |
+
|
110 |
+
## License
|
111 |
+
|
112 |
+
MIT License
|
113 |
+
|
114 |
+
---
|
115 |
+
|
116 |
+
**Author:** Giustino Esposito
|