SwarmChat / README.md
Mohammed-majeed's picture
SwarmChat
712d204

A newer version of the Gradio SDK is available: 5.42.0

Upgrade
metadata
title: SwarmChat
emoji: 🤖
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.34.2
python_version: '3.10'
app_file: app.py
pinned: true
preload_from_hub:
  - facebook/seamless-m4t-v2-large
  - Inventors-Hub/SwarmChat-models EuroLLM-9B-Instruct-Q4_K_M.gguf
  - >-
    Inventors-Hub/SwarmChat-models
    Falcon3-10B-Instruct-BehaviorTree-3epochs.Q4_K_M.gguf
  - Inventors-Hub/SwarmChat-models llama-guard-3-8b-q4_k_m.gguf
thumbnail: >
  https://cdn-uploads.huggingface.co/production/uploads/66bf0ec6399b14b08f570b0b/oVrcxcvvyl769-b3TffNe.png
short_description: Natural Language Control for Swarm Robotics

SwarmChat: Enabling Intuitive Swarm Robotics with Natural Language

SwarmChat is an innovative project that enables intuitive communication with swarm robotics through natural language. This system integrates advanced audio transcription, text processing, and safety mechanisms with a live simulation environment that visualizes a swarm of agents executing behavior trees.

🚀 This project is Funded by the European Union’s UTTER programme, in collaboration with the UTTER consortium.

Features

  • Audio Input Processing:

    • Record commands via a microphone.
    • Translate speech into English using the facebook/seamless-m4t-v2-large model.
    • Perform a safety check on the translated text before execution.
  • Text Input Processing:

    • Enter text commands for swarm control.
    • Translate text using EuroLLM (EuroLLM-9B-Instruct).
    • Detect unsafe or inappropriate content with an integrated safety module.
  • Safety Module:

    • Utilizes Llama-Guard model (Llama-Guard-3-8B) for safety classification.
    • Identifies unsafe content across predefined categories (e.g., violent crimes, privacy violations, hate speech).
    • Ensures commands comply with safety standards.
  • Swarm Simulation:

    • Visualize a swarm of agents in a live simulation powered by Violet simulator and Pygame.
    • Agents are controlled by behavior trees defined in an XML file (tree.xml), using the py_trees framework.
    • Real-time simulation updates streamed via a Gradio web interface.
  • Behavior Tree Generator:

    • Falcon3-10B-Instruct-BehaviorTree model to dynamically generate behavior trees in XML format.
    • Automatically extracts available behaviors from the SwarmAgent class and constructs a detailed prompt using a predefined XML template.
    • Generates and saves new behavior tree configurations (updating tree.xml) based on user-specified tasks.
  • Integrated Interface:

    • A unified Gradio web interface for both audio and text inputs.
    • Live streaming of the simulation environment.
    • Seamless switching between different input modalities.

Technology Stack

  • Backend:

    • Python
    • Transformers (Hugging Face)
    • PyTorch
    • Pygame
    • Threading and Queue modules for simulation management
  • Frontend:

    • Gradio for an interactive web-based interface.
  • AI Models:

  • Behavior Trees:

    • Agents utilize behavior trees—parsed from XML and built with py_trees—to dictate their actions within the simulation.

Installation

  1. Clone the repository:

    git clone https://github.com/Inventors-Hub/SwarmChat.git
    cd SwarmChat
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Setup AI Models:

  • Place the EuroLLM model file (EuroLLM-9B-Instruct-Q4_K_M.gguf) at the specified path in text_processing.py.
  • Place the LLaMA Guard model file (llama-guard-3-8b-q4_k_m.gguf) at the specified path in safety_module.py.
  • Place the DeepSeek model file (Falcon3-10B-Instruct-BehaviorTree-3-epochs-GGUF) at the specified path in bt_generator.py.
  1. Run the Application:

    python app.py
    
  2. Access the Interface:

    Open your browser and navigate to http://127.0.0.1:7860 to start using SwarmChat.

Overview of Modules

  • app.py
    The main application integrates audio/text processing, behavior tree generation, and the live simulation. It sets up the Gradio interface, handles simulation streaming, and routes user inputs to the appropriate processing modules.

  • speech_processing.py
    Implements audio transcription and translation using the facebook/seamless-m4t-v2-large model.

  • text_processing.py
    Translates text commands using EuroLLM (EuroLLM-9B-Instruct).

  • safety_module.py
    Utilizes LLaMA Guard to assess the safety of incoming commands, ensuring compliance with safety policies.

  • bt_generator.py
    Dynamically generates behavior trees in XML format by extracting behaviors from the SwarmAgent class, constructing a prompt, and querying Falcon3-10B-Instruct-BehaviorTree model. The generated XML is saved to tree.xml for simulation use.

  • simulator_env.py
    Powers the simulation environment, manages agent behaviors using XML-defined behavior trees, and handles real-time simulation updates.

Acknowledgments

This work was funded by the European Union under the UTTER programme.
We gratefully acknowledge the support of the entire UTTER consortium.