Giustino98 commited on
Commit
b038f16
Β·
1 Parent(s): e4017a7

updated README

Browse files
Files changed (1) hide show
  1. README.md +102 -0
README.md CHANGED
@@ -9,6 +9,108 @@ app_file: app.py
9
  pinned: false
10
  license: mit
11
  short_description: A multi-agent application that converts images into drawio
 
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  pinned: false
10
  license: mit
11
  short_description: A multi-agent application that converts images into drawio
12
+ tag: agent-demo-track,mcp-server-track
13
  ---
14
 
15
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
16
+
17
+ # Image to Draw.io Converter
18
+
19
+ A multi-agent application that converts images into editable [Draw.io](https://app.diagrams.net/) diagrams using advanced LLM-based object detection and diagram generation. The system is built with [LangGraph](https://github.com/langchain-ai/langgraph) and features a modern [Gradio](https://gradio.app/) web interface, which is also exposed as an MCP server for integration with external clients.
20
+
21
+ ---
22
+
23
+ ## Features
24
+
25
+ - **Multi-Agent Pipeline (LangGraph):**
26
+ - **Supervisor Agent:** Orchestrates the workflow and coordinates the agents.
27
+ - **Object Detection Agent (React):** Uses an LLM to detect and extract objects from the uploaded image.
28
+ - **Draw.io Generator Agent (React):** Converts detected objects into Draw.io XML diagrams.
29
+
30
+ - **Automated Workflow:**
31
+ 1. Upload an image via the Gradio interface.
32
+ 2. The object detection agent identifies and extracts key objects.
33
+ 3. The diagram generator agent creates a Draw.io XML diagram.
34
+ 4. Download the `.drawio` file or preview the SVG directly in the browser.
35
+
36
+ - **Modern Gradio UI:**
37
+ - Simple drag-and-drop image upload.
38
+ - Real-time status updates and diagram preview.
39
+ - Downloadable Draw.io file and copyable XML.
40
+ - SVG preview of the generated diagram.
41
+ - Exposed as an MCP server for programmatic access.
42
+
43
+ ---
44
+
45
+ ## Requirements
46
+
47
+ - Python 3.10+
48
+ - Install dependencies:
49
+ ```sh
50
+ pip install -r requirements.txt
51
+ ```
52
+ - API keys for your LLM provider (e.g., Google Gemini).
53
+ - Configure your `.env` file:
54
+ ```
55
+ GEMINI_API_KEY=your_gemini_api_key
56
+ GEMINI_MODEL_NAME=gemini-2.5-pro-preview-06-05
57
+ GEMINI_THINKING_BUDGET=128
58
+ ```
59
+
60
+ ---
61
+
62
+ ## Usage
63
+
64
+ 1. **Install dependencies:**
65
+ ```sh
66
+ pip install -r requirements.txt
67
+ ```
68
+
69
+ 2. **Configure environment:**
70
+ - Create a `.env` file in the project root with your API keys and model settings.
71
+
72
+ 3. **Start the application:**
73
+ ```sh
74
+ python app.py
75
+ ```
76
+
77
+ 4. **Access the web interface:**
78
+ - Open the provided local URL in your browser.
79
+ - Upload an image (diagram, sketch, chart, etc.).
80
+ - Click "Generate Diagram".
81
+ - Download the `.drawio` file or copy the XML.
82
+ - Open the file in [diagrams.net](https://app.diagrams.net/) for further editing.
83
+
84
+ ---
85
+
86
+ ## MCP Server Integration
87
+
88
+ The Gradio app is also exposed as an MCP server (`mcp_server=True`), allowing integration with any MCP-compatible client or workflow. This enables automated or remote usage in larger pipelines.
89
+
90
+ ---
91
+
92
+ ## Project Structure
93
+
94
+ - `app.py` β€” Main entry point, Gradio UI, and workflow orchestration.
95
+ - `nodes/` β€” Agent and LangGraph node definitions.
96
+ - `tools/` β€” LLM tools for object detection and Draw.io generation.
97
+ - `output_llm/` β€” Generated Draw.io files, SVG previews, and logs.
98
+ - `files/` β€” Uploaded user images.
99
+
100
+ ---
101
+
102
+ ## Notes
103
+
104
+ - The LLM model and thinking budget are fully configurable via `.env` for maximum flexibility.
105
+ - The application is designed for easy extension with new agents or tools.
106
+ - Performances can vary a lot based on the LLM: in my experience, Gemini 2.5 pro performed much better than Gemini 2.5 flash.
107
+
108
+ ---
109
+
110
+ ## License
111
+
112
+ MIT License
113
+
114
+ ---
115
+
116
+ **Author:** Giustino Esposito