Spaces:
Paused
Paused
File size: 4,112 Bytes
d9c7b42 16169a3 d9c7b42 766f3ce d9c7b42 16169a3 766f3ce d9c7b42 16169a3 d9c7b42 16169a3 d9c7b42 16169a3 68a93fe 16169a3 68a93fe 16169a3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
---
title: Accent Analyzer Agent
emoji: π’
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Various english accent detection
license: mit
---
# Accent Analyzer
This is a Streamlit-based web application that analyzes the English accent in spoken videos. Users can provide a public video URL (MP4), receive a transcription of the speech, and ask follow-up questions based on the transcript using Gemma3.
## What It Does
- Accepts a public **MP4 video URL**
- Extracts audio and transcribes it using **OpenAI Whisper Medium**
- Detects accent using a **Jzuluaga/accent-id-commonaccent_xlsr-en-english** model
- Lets users ask **follow-up questions** about the transcript using **Gemma3**
- Deploys easily on **Hugging Face Spaces** with CPU
---
## Tech Stack
- **Streamlit** β UI
- **OpenAI Whisper (medium)**: For speech-to-text transcription.
- **Jzuluaga/accent-id-commonaccent_xlsr-en-english**: For English accent classification.
- **Gemma3 via Ollama**: For generating answers to follow-up questions using context from the transcript.
- **Docker** β containerized for deployment
- **Hugging Face Spaces** β for hosting with CPU
---
## Project Structure
```
accent-analyzer/
βββ Dockerfile # Container setup
βββ requirements.txt # Python dependencies
βββ streamlit_app.py # Main UI app
βββ src/
βββ custome_interface.py # SpeechBrain custom interface
βββ tools/
β βββ accent_tool.py # Audio analysis tool
βββ app/
βββ main_agent.py # Analysis + LLaMA agents
```
---
## Running Locally (GPU Required)
1. Clone the repo:
```bash
git clone https://github.com/your-username/accent-analyzer
cd accent-analyzer
```
2. Build the Docker image:
```bash
docker build -t accent-analyzer .
```
3. Run the container:
```bash
docker run --gpus all -p 8501:8501 accent-analyzer
```
4. Visit: [http://localhost:8501](http://localhost:8501)
---
## Requirements
`requirements.txt` should include at least:
```
streamlit>=1.25.0
requests==2.31.0
pydub==0.25.1
torch==1.11.0
torchaudio==0.11.0
speechbrain==0.5.12
transformers==4.29.2
asyncio==3.4.3
ffmpeg-python==0.2.0
openai-whisper==20230314
numpy==1.22.4
langchain>=0.1.0
langchain-community>=0.0.30
torchvision==0.12.0
langgraph>=0.0.20
```
---
## Notes
- Gemma3 is accessed via **Ollama** inside Docker β ensure it pulls on build.
- `custome_interface.py` is required by the accent model β itβs automatically downloaded in Dockerfile.
- Video URLs must be **direct links** to `.mp4` files.
---
## Example Prompt
```
Analyze this video: https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4
```
Then follow up with:
```
Where is the speaker probably from?
What is the tone or emotion?
Summarize the video?
```
---
## Acknowledgments
This project uses the following models, frameworks, and tools:
- [OpenAI Whisper](https://github.com/openai/whisper): Automatic speech recognition model.
- [SpeechBrain](https://speechbrain.readthedocs.io/): Toolkit used for building and fine-tuning speech processing models.
- [Accent-ID CommonAccent](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english): Fine-tuned wav2vec2 model hosted on Hugging Face for English accent classification.
- [CustomEncoderWav2vec2Classifier](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english/blob/main/custom_interface.py): Custom interface used to load and run the accent model.
- [Gemma3](https://ollama.com/library/gemma3) via [Ollama](https://ollama.com): Large language model used for natural language follow-up based on transcripts.
- [Streamlit](https://streamlit.io): Python framework for building web applications.
- [Hugging Face Spaces](https://huggingface.co/spaces): Platform used for deploying this application on GPU infrastructure.
---
## Author
- Developed by [Aswathi T S](https://github.com/ash-171)
---
## License
This project is licensed under the `MIT License`.
|