metadata

title: Accent Analyzer Agent
emoji: 🏢
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: Various english accent detection
license: mit

Accent Analyzer

This is a Streamlit-based web application that analyzes the English accent in spoken videos. Users can provide a public video URL (MP4), receive a transcription of the speech using Whisper Base, and ask follow-up questions based on the transcript using Gemma3:1b.

What It Does

Accepts a public MP4 video URL
Extracts audio and transcribes it using OpenAI Whisper Base
Detects accent using a Jzuluaga/accent-id-commonaccent_xlsr-en-english model
Lets users ask follow-up questions about the transcript using Gemma3
Deploys easily on Hugging Face Spaces with CPU

Tech Stack

Streamlit — UI
OpenAI Whisper (base): For speech-to-text transcription.
Jzuluaga/accent-id-commonaccent_xlsr-en-english: For English accent classification.
Gemma3:1b via Ollama: For generating answers to follow-up questions using context from the transcript.
Docker — containerized for deployment
Hugging Face Spaces — for hosting with CPU

Project Structure

accent-analyzer/
├── Dockerfile                  # Container setup
├── start.sh                    # Serving Ollama and app setup
├── README.md                   # Instruction about the app
├── requirements.txt            # Python dependencies
├── streamlit_app.py            # Main UI app
└── src/
    ├── custome_interface.py    # SpeechBrain custom interface
    ├── tools/
    │   └── accent_tool.py      # Audio analysis tool
    └── app/
        └── main_agent.py       # Analysis + LLaMA agents

Running Locally (GPU Required)

Clone the repo:

git clone https://huggingface.co/spaces/ash-171/accent-detection
cd accent-analyzer

Build the Docker image:

docker build -t accent-analyzer .

Run the container:

docker run --gpus all -p 8501:8501 accent-analyzer

You can also run : streamlit run streamlit_app.py to deploy the app locally.
Visit: http://localhost:8501

Requirements

requirements.txt should include at least:

streamlit>=1.25.0
requests==2.31.0
pydub==0.25.1
torch==1.11.0
torchaudio==0.11.0
speechbrain==0.5.12
transformers==4.29.2
asyncio==3.4.3
ffmpeg-python==0.2.0
openai-whisper==20230314
numpy==1.22.4
langchain>=0.1.0
langchain-community>=0.0.30
torchvision==0.12.0
langgraph>=0.0.20

Notes

Gemma3:1b is accessed via Ollama inside Docker — ensure it pulls on build.
custome_interface.py is required by the accent model — it’s automatically downloaded in Dockerfile.
Video URLs must be direct links to .mp4 files.

Example Prompt

Analyze this video: https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4

Then follow up with:

Where is the speaker probably from?
What is the tone or emotion?
Summarize the video?

Acknowledgments

This project uses the following models, frameworks, and tools:

OpenAI Whisper: Automatic speech recognition model.
SpeechBrain: Toolkit used for building and fine-tuning speech processing models.
Accent-ID CommonAccent: Fine-tuned wav2vec2 model hosted on Hugging Face for English accent classification.
CustomEncoderWav2vec2Classifier: Custom interface used to load and run the accent model.
Gemma3:1b via Ollama: Large language model used for natural language follow-up based on transcripts.
Streamlit: Python framework for building web applications.
Hugging Face Spaces: Platform used for deploying this application on GPU infrastructure.

Note

Due to unavailability of GPU the app will be extremely slow. The output has been test in local system and verified.

Author

Developed by Aswathi T S

License

This project is licensed under the MIT License.