Spaces:

ash-171
/

accent-detection

Sleeping

App Files Files Community

ash-171 commited on May 30

Commit

16169a3

verified ·

1 Parent(s): 5a8c370

Update README.md

Browse files

Files changed (1) hide show

README.md +141 -7

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: Accent Detection
-emoji: 🚀
 colorFrom: red
 colorTo: red
 sdk: docker
@@ -8,13 +8,147 @@ app_port: 8501
 tags:
 - streamlit
 pinned: false
-short_description: 'accent detection '
 license: mit
 ---
-# Welcome to Streamlit!
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).

 ---
+title: Accent Analyzer Agent
+emoji: 🏢
 colorFrom: red
 colorTo: red
 sdk: docker
 tags:
 - streamlit
 pinned: false
+short_description: Various english accent detection
 license: mit
 ---
+# Accent Analyzer
+This is a Streamlit-based web application that analyzes the English accent in spoken videos. Users can provide a public video URL (MP4), receive a transcription of the speech, and ask follow-up questions based on the transcript using Gemma3.
+## What It Does
+- Accepts a public **MP4 video URL**
+- Extracts audio and transcribes it using **OpenAI Whisper Medium**
+- Detects accent using a **Jzuluaga/accent-id-commonaccent_xlsr-en-english** model
+- Lets users ask **follow-up questions** about the transcript using **Gemma3**
+- Deploys easily on **Hugging Face Spaces** with CPU
+---
+## Tech Stack
+- **Streamlit** — UI
+- **OpenAI Whisper (medium)**: For speech-to-text transcription.
+- **Jzuluaga/accent-id-commonaccent_xlsr-en-english**: For English accent classification.
+- **Gemma3 via Ollama**: For generating answers to follow-up questions using context from the transcript.
+- **Docker** — containerized for deployment
+- **Hugging Face Spaces** — for hosting with CPU
+---
+## Project Structure
+```
+accent-analyzer/
+├── Dockerfile                  # Container setup
+├── requirements.txt            # Python dependencies
+├── streamlit_app.py            # Main UI app
+└── src/
+    ├── custome_interface.py    # SpeechBrain custom interface
+    ├── tools/
+    │   └── accent_tool.py      # Audio analysis tool
+    └── app/
+        └── main_agent.py       # Analysis + LLaMA agents
+```
+---
+## Running Locally (GPU Required)
+1. Clone the repo:
+```bash
+git clone https://github.com/your-username/accent-analyzer
+cd accent-analyzer
+```
+2. Build the Docker image:
+```bash
+docker build -t accent-analyzer .
+```
+3. Run the container:
+```bash
+docker run --gpus all -p 7860:7860 accent-analyzer
+```
+4. Visit: [http://localhost:7860](http://localhost:7860)
+---
+## Requirements
+`requirements.txt` should include at least:
+```
+streamlit>=1.25.0
+requests==2.31.0
+pydub==0.25.1
+torch==1.11.0
+torchaudio==0.11.0
+speechbrain==0.5.12
+transformers==4.29.2
+asyncio==3.4.3
+ffmpeg-python==0.2.0
+openai-whisper==20230314
+numpy==1.22.4
+langchain>=0.1.0
+langchain-community>=0.0.30
+torchvision==0.12.0
+langgraph>=0.0.20
+```
+---
+## Notes
+- Gemma3 is accessed via **Ollama** inside Docker — ensure it pulls on build.
+- `custome_interface.py` is required by the accent model — it’s automatically downloaded in Dockerfile.
+- Video URLs must be **direct links** to `.mp4` files.
+---
+## Example Prompt
+```
+Analyze this video: https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4
+```
+Then follow up with:
+```
+Where is the speaker probably from?
+What is the tone or emotion?
+Summarize the video?
+```
+---
+## Acknowledgments
+This project uses the following models, frameworks, and tools:
+- [OpenAI Whisper](https://github.com/openai/whisper): Automatic speech recognition model.
+- [SpeechBrain](https://speechbrain.readthedocs.io/): Toolkit used for building and fine-tuning speech processing models.
+- [Accent-ID CommonAccent](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english): Fine-tuned wav2vec2 model hosted on Hugging Face for English accent classification.
+- [CustomEncoderWav2vec2Classifier](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english/blob/main/custom_interface.py): Custom interface used to load and run the accent model.
+- [Gemma3](https://ollama.com/library/gemma3) via [Ollama](https://ollama.com): Large language model used for natural language follow-up based on transcripts.
+- [Streamlit](https://streamlit.io): Python framework for building web applications.
+- [Hugging Face Spaces](https://huggingface.co/spaces): Platform used for deploying this application on GPU infrastructure.
+---
+## Author
+- Developed by [Aswathi T S](https://github.com/ash-171)
+---
+## License
+This project is licensed under the `MIT License`.