Spaces:
Sleeping
Sleeping
metadata
title: Accent Analyzer Agent
emoji: π’
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Various english accent detection
license: mit
Accent Analyzer
This is a Streamlit-based web application that analyzes the English accent in spoken videos. Users can provide a public video URL (MP4), receive a transcription of the speech, and ask follow-up questions based on the transcript using Gemma3.
What It Does
- Accepts a public MP4 video URL
- Extracts audio and transcribes it using OpenAI Whisper Medium
- Detects accent using a Jzuluaga/accent-id-commonaccent_xlsr-en-english model
- Lets users ask follow-up questions about the transcript using Gemma3
- Deploys easily on Hugging Face Spaces with CPU
Tech Stack
- Streamlit β UI
- OpenAI Whisper (medium): For speech-to-text transcription.
- Jzuluaga/accent-id-commonaccent_xlsr-en-english: For English accent classification.
- Gemma3 via Ollama: For generating answers to follow-up questions using context from the transcript.
- Docker β containerized for deployment
- Hugging Face Spaces β for hosting with CPU
Project Structure
accent-analyzer/
βββ Dockerfile # Container setup
βββ requirements.txt # Python dependencies
βββ streamlit_app.py # Main UI app
βββ src/
βββ custome_interface.py # SpeechBrain custom interface
βββ tools/
β βββ accent_tool.py # Audio analysis tool
βββ app/
βββ main_agent.py # Analysis + LLaMA agents
Running Locally (GPU Required)
- Clone the repo:
git clone https://github.com/your-username/accent-analyzer
cd accent-analyzer
- Build the Docker image:
docker build -t accent-analyzer .
- Run the container:
docker run --gpus all -p 8501:8501 accent-analyzer
- Visit: http://localhost:8501
Requirements
requirements.txt
should include at least:
streamlit>=1.25.0
requests==2.31.0
pydub==0.25.1
torch==1.11.0
torchaudio==0.11.0
speechbrain==0.5.12
transformers==4.29.2
asyncio==3.4.3
ffmpeg-python==0.2.0
openai-whisper==20230314
numpy==1.22.4
langchain>=0.1.0
langchain-community>=0.0.30
torchvision==0.12.0
langgraph>=0.0.20
Notes
- Gemma3 is accessed via Ollama inside Docker β ensure it pulls on build.
custome_interface.py
is required by the accent model β itβs automatically downloaded in Dockerfile.- Video URLs must be direct links to
.mp4
files.
Example Prompt
Analyze this video: https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4
Then follow up with:
Where is the speaker probably from?
What is the tone or emotion?
Summarize the video?
Acknowledgments
This project uses the following models, frameworks, and tools:
- OpenAI Whisper: Automatic speech recognition model.
- SpeechBrain: Toolkit used for building and fine-tuning speech processing models.
- Accent-ID CommonAccent: Fine-tuned wav2vec2 model hosted on Hugging Face for English accent classification.
- CustomEncoderWav2vec2Classifier: Custom interface used to load and run the accent model.
- Gemma3 via Ollama: Large language model used for natural language follow-up based on transcripts.
- Streamlit: Python framework for building web applications.
- Hugging Face Spaces: Platform used for deploying this application on GPU infrastructure.
Author
- Developed by Aswathi T S
License
This project is licensed under the MIT License
.