metadata

sdk: gradio
python_version: '3.10'
app_file: app.py

Farsi Audio Chatbot

This is a Gradio-based application that allows users to speak in Farsi, receive a response from a chatbot, and hear the response in Farsi audio.

Prerequisites

Python 3.8 or higher

Installation

Clone this repository.
Create and activate a virtual environment.
Install dependencies: pip install -r requirements.txt
Run the application: python app.py

How It Works

Speech-to-Text (STT): Uses Whisper small for converting Farsi speech to text.
Natural Language Processing (NLP): Uses HooshvareLab/gpt2-fa to generate Farsi text responses.
Text-to-Speech (TTS): Uses edge-tts with the fa-IR-FaridNeural voice for Farsi audio output.

Deployment on Hugging Face Spaces

To deploy on Hugging Face Spaces:

Create a new Space.
Upload this repository, including requirements.txt, app.py, and README.md.
Ensure the Space has sufficient resources (at least 2GB RAM, GPU optional).
The app will automatically build and run.

Note: The current version processes audio inputs discretely (via button click). For continuous streaming, additional optimizations like real-time audio chunk processing are needed.

Limitations

Whisper-small may have reduced accuracy in noisy environments.
GPT2-fa is suitable for short responses but may struggle with complex conversations.
Continuous audio streaming is not yet implemented.

Citations

Whisper (Speech-to-Text): openai/whisper-small
Chatbot (NLP): HooshvareLab/gpt2-fa
edge-tts (Text-to-Speech): edge-tts