accent-detection / README.md
ash-171's picture
Update README.md
68a93fe verified
|
raw
history blame
4.11 kB
metadata
title: Accent Analyzer Agent
emoji: 🏒
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: Various english accent detection
license: mit

Accent Analyzer

This is a Streamlit-based web application that analyzes the English accent in spoken videos. Users can provide a public video URL (MP4), receive a transcription of the speech, and ask follow-up questions based on the transcript using Gemma3.

What It Does

  • Accepts a public MP4 video URL
  • Extracts audio and transcribes it using OpenAI Whisper Medium
  • Detects accent using a Jzuluaga/accent-id-commonaccent_xlsr-en-english model
  • Lets users ask follow-up questions about the transcript using Gemma3
  • Deploys easily on Hugging Face Spaces with CPU

Tech Stack

  • Streamlit β€” UI
  • OpenAI Whisper (medium): For speech-to-text transcription.
  • Jzuluaga/accent-id-commonaccent_xlsr-en-english: For English accent classification.
  • Gemma3 via Ollama: For generating answers to follow-up questions using context from the transcript.
  • Docker β€” containerized for deployment
  • Hugging Face Spaces β€” for hosting with CPU

Project Structure

accent-analyzer/
β”œβ”€β”€ Dockerfile                  # Container setup
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ streamlit_app.py            # Main UI app
└── src/
    β”œβ”€β”€ custome_interface.py    # SpeechBrain custom interface
    β”œβ”€β”€ tools/
    β”‚   └── accent_tool.py      # Audio analysis tool
    └── app/
        └── main_agent.py       # Analysis + LLaMA agents

Running Locally (GPU Required)

  1. Clone the repo:
git clone https://github.com/your-username/accent-analyzer
cd accent-analyzer
  1. Build the Docker image:
docker build -t accent-analyzer .
  1. Run the container:
docker run --gpus all -p 8501:8501 accent-analyzer
  1. Visit: http://localhost:8501

Requirements

requirements.txt should include at least:

streamlit>=1.25.0
requests==2.31.0
pydub==0.25.1
torch==1.11.0
torchaudio==0.11.0
speechbrain==0.5.12
transformers==4.29.2
asyncio==3.4.3
ffmpeg-python==0.2.0
openai-whisper==20230314
numpy==1.22.4
langchain>=0.1.0
langchain-community>=0.0.30
torchvision==0.12.0
langgraph>=0.0.20

Notes

  • Gemma3 is accessed via Ollama inside Docker β€” ensure it pulls on build.
  • custome_interface.py is required by the accent model β€” it’s automatically downloaded in Dockerfile.
  • Video URLs must be direct links to .mp4 files.

Example Prompt

Analyze this video: https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4

Then follow up with:

Where is the speaker probably from?
What is the tone or emotion?
Summarize the video?

Acknowledgments

This project uses the following models, frameworks, and tools:


Author


License

This project is licensed under the MIT License.