image2drawio / README.md
Giustino98's picture
fixed readme
9271fec

A newer version of the Gradio SDK is available: 5.44.1

Upgrade
metadata
title: Image2drawio
emoji: 🐒
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 5.33.1
app_file: app.py
pinned: false
license: mit
short_description: A multi-agent application that converts images into drawio
tags:
  - agent-demo-track
  - mcp-server-track

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Image to Draw.io Converter

A multi-agent application that converts images into editable Draw.io diagrams using advanced LLM-based object detection and diagram generation. The system is built with LangGraph and features a modern Gradio web interface, which is also exposed as an MCP server for integration with external clients.

Examples outcome added in "example-outcome.png"


Features

  • Multi-Agent Pipeline (LangGraph):

    • Supervisor Agent: Orchestrates the workflow and coordinates the agents.
    • Object Detection Agent (React): Uses an LLM to detect and extract objects from the uploaded image.
    • Draw.io Generator Agent (React): Converts detected objects into Draw.io XML diagrams.
  • Automated Workflow:

    1. Upload an image via the Gradio interface.
    2. The object detection agent identifies and extracts key objects.
    3. The diagram generator agent creates a Draw.io XML diagram.
    4. Download the .drawio file or preview the SVG directly in the browser.
  • Modern Gradio UI:

    • Simple drag-and-drop image upload.
    • Real-time status updates and diagram preview.
    • Downloadable Draw.io file and copyable XML.
    • SVG preview of the generated diagram.
    • Exposed as an MCP server for programmatic access.

Requirements

  • Python 3.10+
  • Install dependencies:
    pip install -r requirements.txt
    
  • API keys for your LLM provider (e.g., Google Gemini).
  • Configure your .env file:
    GEMINI_API_KEY=your_gemini_api_key
    GEMINI_MODEL_NAME=gemini-2.5-pro-preview-06-05
    GEMINI_THINKING_BUDGET=128
    

Usage

  1. Install dependencies:

    pip install -r requirements.txt
    
  2. Configure environment:

    • Create a .env file in the project root with your API keys and model settings.
  3. Start the application:

    python app.py
    
  4. Access the web interface:

    • Open the provided local URL in your browser.
    • Upload an image (diagram, sketch, chart, etc.).
    • Click "Generate Diagram".
    • Download the .drawio file or copy the XML.
    • Open the file in diagrams.net for further editing.

MCP Server Integration

The Gradio app is also exposed as an MCP server (mcp_server=True), allowing integration with any MCP-compatible client or workflow. This enables automated or remote usage in larger pipelines.


Project Structure

  • app.py β€” Main entry point, Gradio UI, and workflow orchestration.
  • nodes/ β€” Agent and LangGraph node definitions.
  • tools/ β€” LLM tools for object detection and Draw.io generation.
  • output_llm/ β€” Generated Draw.io files, SVG previews, and logs.
  • files/ β€” Uploaded user images.

Notes

  • The LLM model and thinking budget are fully configurable via .env for maximum flexibility.
  • The application is designed for easy extension with new agents or tools.
  • Performances can vary a lot based on the LLM: in my experience, Gemini 2.5 pro performed much better than Gemini 2.5 flash.

License

MIT License


Author: Giustino Esposito