EmRa228's picture
Update README.md
2d20b22 verified

A newer version of the Gradio SDK is available: 5.30.0

Upgrade
metadata
sdk: gradio
python_version: '3.10'
app_file: app.py

Farsi Audio Chatbot

This is a Gradio-based application that allows users to speak in Farsi, receive a response from a chatbot, and hear the response in Farsi audio.

Prerequisites

  • Python 3.8 or higher

Installation

  1. Clone this repository.
  2. Create and activate a virtual environment.
  3. Install dependencies: pip install -r requirements.txt
  4. Run the application: python app.py

How It Works

  • Speech-to-Text (STT): Uses Whisper small for converting Farsi speech to text.
  • Natural Language Processing (NLP): Uses HooshvareLab/gpt2-fa to generate Farsi text responses.
  • Text-to-Speech (TTS): Uses edge-tts with the fa-IR-FaridNeural voice for Farsi audio output.

Deployment on Hugging Face Spaces

To deploy on Hugging Face Spaces:

  1. Create a new Space.
  2. Upload this repository, including requirements.txt, app.py, and README.md.
  3. Ensure the Space has sufficient resources (at least 2GB RAM, GPU optional).
  4. The app will automatically build and run.

Note: The current version processes audio inputs discretely (via button click). For continuous streaming, additional optimizations like real-time audio chunk processing are needed.

Limitations

  • Whisper-small may have reduced accuracy in noisy environments.
  • GPT2-fa is suitable for short responses but may struggle with complex conversations.
  • Continuous audio streaming is not yet implemented.

Citations