Spaces:

ash-171
/

accent-detection

Paused

File size: 4,112 Bytes

---
title: Accent Analyzer Agent
emoji: 🏢
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Various english accent detection
license: mit
---

# Accent Analyzer

This is a Streamlit-based web application that analyzes the English accent in spoken videos. Users can provide a public video URL (MP4), receive a transcription of the speech, and ask follow-up questions based on the transcript using Gemma3.

## What It Does

- Accepts a public **MP4 video URL**
- Extracts audio and transcribes it using **OpenAI Whisper Medium**
- Detects accent using a **Jzuluaga/accent-id-commonaccent_xlsr-en-english** model
- Lets users ask **follow-up questions** about the transcript using **Gemma3**
- Deploys easily on **Hugging Face Spaces** with CPU

---

## Tech Stack

- **Streamlit** — UI  
- **OpenAI Whisper (medium)**: For speech-to-text transcription.
- **Jzuluaga/accent-id-commonaccent_xlsr-en-english**: For English accent classification.
- **Gemma3 via Ollama**: For generating answers to follow-up questions using context from the transcript.
- **Docker** — containerized for deployment  
- **Hugging Face Spaces** — for hosting with CPU  

---

## Project Structure

```
accent-analyzer/
├── Dockerfile                  # Container setup
├── requirements.txt            # Python dependencies
├── streamlit_app.py            # Main UI app
└── src/
    ├── custome_interface.py    # SpeechBrain custom interface
    ├── tools/
    │   └── accent_tool.py      # Audio analysis tool
    └── app/
        └── main_agent.py       # Analysis + LLaMA agents
```

---

## Running Locally (GPU Required)

1. Clone the repo:

```bash
git clone https://github.com/your-username/accent-analyzer
cd accent-analyzer
```

2. Build the Docker image:

```bash
docker build -t accent-analyzer .
```

3. Run the container:

```bash
docker run --gpus all -p 8501:8501 accent-analyzer
```

4. Visit: [http://localhost:8501](http://localhost:8501)

---


## Requirements

`requirements.txt` should include at least:

```
streamlit>=1.25.0
requests==2.31.0
pydub==0.25.1
torch==1.11.0
torchaudio==0.11.0
speechbrain==0.5.12
transformers==4.29.2
asyncio==3.4.3
ffmpeg-python==0.2.0
openai-whisper==20230314
numpy==1.22.4
langchain>=0.1.0
langchain-community>=0.0.30
torchvision==0.12.0
langgraph>=0.0.20

```

---

## Notes

- Gemma3 is accessed via **Ollama** inside Docker — ensure it pulls on build.
- `custome_interface.py` is required by the accent model — it’s automatically downloaded in Dockerfile.
- Video URLs must be **direct links** to `.mp4` files.

---

## Example Prompt

```
Analyze this video: https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4
```

Then follow up with:

```
Where is the speaker probably from?
What is the tone or emotion?
Summarize the video?
```

---
## Acknowledgments

This project uses the following models, frameworks, and tools:

- [OpenAI Whisper](https://github.com/openai/whisper): Automatic speech recognition model.
- [SpeechBrain](https://speechbrain.readthedocs.io/): Toolkit used for building and fine-tuning speech processing models.
- [Accent-ID CommonAccent](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english): Fine-tuned wav2vec2 model hosted on Hugging Face for English accent classification.
- [CustomEncoderWav2vec2Classifier](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english/blob/main/custom_interface.py): Custom interface used to load and run the accent model.
- [Gemma3](https://ollama.com/library/gemma3) via [Ollama](https://ollama.com): Large language model used for natural language follow-up based on transcripts.
- [Streamlit](https://streamlit.io): Python framework for building web applications.
- [Hugging Face Spaces](https://huggingface.co/spaces): Platform used for deploying this application on GPU infrastructure.


---

## Author

- Developed by [Aswathi T S](https://github.com/ash-171)

---

## License

This project is licensed under the `MIT License`.