Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.30.0
metadata
sdk: gradio
python_version: '3.10'
app_file: app.py
Farsi Audio Chatbot
This is a Gradio-based application that allows users to speak in Farsi, receive a response from a chatbot, and hear the response in Farsi audio.
Prerequisites
- Python 3.8 or higher
Installation
- Clone this repository.
- Create and activate a virtual environment.
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python app.py
How It Works
- Speech-to-Text (STT): Uses Whisper small for converting Farsi speech to text.
- Natural Language Processing (NLP): Uses HooshvareLab/gpt2-fa to generate Farsi text responses.
- Text-to-Speech (TTS): Uses edge-tts with the
fa-IR-FaridNeural
voice for Farsi audio output.
Deployment on Hugging Face Spaces
To deploy on Hugging Face Spaces:
- Create a new Space.
- Upload this repository, including
requirements.txt
,app.py
, andREADME.md
. - Ensure the Space has sufficient resources (at least 2GB RAM, GPU optional).
- The app will automatically build and run.
Note: The current version processes audio inputs discretely (via button click). For continuous streaming, additional optimizations like real-time audio chunk processing are needed.
Limitations
- Whisper-small may have reduced accuracy in noisy environments.
- GPT2-fa is suitable for short responses but may struggle with complex conversations.
- Continuous audio streaming is not yet implemented.
Citations
- Whisper (Speech-to-Text): openai/whisper-small
- Chatbot (NLP): HooshvareLab/gpt2-fa
- edge-tts (Text-to-Speech): edge-tts