ash-171 commited on
Commit
16169a3
Β·
verified Β·
1 Parent(s): 5a8c370

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -7
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- title: Accent Detection
3
- emoji: πŸš€
4
  colorFrom: red
5
  colorTo: red
6
  sdk: docker
@@ -8,13 +8,147 @@ app_port: 8501
8
  tags:
9
  - streamlit
10
  pinned: false
11
- short_description: 'accent detection '
12
  license: mit
13
  ---
14
 
15
- # Welcome to Streamlit!
16
 
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
 
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Accent Analyzer Agent
3
+ emoji: 🏒
4
  colorFrom: red
5
  colorTo: red
6
  sdk: docker
 
8
  tags:
9
  - streamlit
10
  pinned: false
11
+ short_description: Various english accent detection
12
  license: mit
13
  ---
14
 
15
+ # Accent Analyzer
16
 
17
+ This is a Streamlit-based web application that analyzes the English accent in spoken videos. Users can provide a public video URL (MP4), receive a transcription of the speech, and ask follow-up questions based on the transcript using Gemma3.
18
 
19
+ ## What It Does
20
+
21
+ - Accepts a public **MP4 video URL**
22
+ - Extracts audio and transcribes it using **OpenAI Whisper Medium**
23
+ - Detects accent using a **Jzuluaga/accent-id-commonaccent_xlsr-en-english** model
24
+ - Lets users ask **follow-up questions** about the transcript using **Gemma3**
25
+ - Deploys easily on **Hugging Face Spaces** with CPU
26
+
27
+ ---
28
+
29
+ ## Tech Stack
30
+
31
+ - **Streamlit** β€” UI
32
+ - **OpenAI Whisper (medium)**: For speech-to-text transcription.
33
+ - **Jzuluaga/accent-id-commonaccent_xlsr-en-english**: For English accent classification.
34
+ - **Gemma3 via Ollama**: For generating answers to follow-up questions using context from the transcript.
35
+ - **Docker** β€” containerized for deployment
36
+ - **Hugging Face Spaces** β€” for hosting with CPU
37
+
38
+ ---
39
+
40
+ ## Project Structure
41
+
42
+ ```
43
+ accent-analyzer/
44
+ β”œβ”€β”€ Dockerfile # Container setup
45
+ β”œβ”€β”€ requirements.txt # Python dependencies
46
+ β”œβ”€β”€ streamlit_app.py # Main UI app
47
+ └── src/
48
+ β”œβ”€β”€ custome_interface.py # SpeechBrain custom interface
49
+ β”œβ”€β”€ tools/
50
+ β”‚ └── accent_tool.py # Audio analysis tool
51
+ └── app/
52
+ └── main_agent.py # Analysis + LLaMA agents
53
+ ```
54
+
55
+ ---
56
+
57
+ ## Running Locally (GPU Required)
58
+
59
+ 1. Clone the repo:
60
+
61
+ ```bash
62
+ git clone https://github.com/your-username/accent-analyzer
63
+ cd accent-analyzer
64
+ ```
65
+
66
+ 2. Build the Docker image:
67
+
68
+ ```bash
69
+ docker build -t accent-analyzer .
70
+ ```
71
+
72
+ 3. Run the container:
73
+
74
+ ```bash
75
+ docker run --gpus all -p 7860:7860 accent-analyzer
76
+ ```
77
+
78
+ 4. Visit: [http://localhost:7860](http://localhost:7860)
79
+
80
+ ---
81
+
82
+
83
+ ## Requirements
84
+
85
+ `requirements.txt` should include at least:
86
+
87
+ ```
88
+ streamlit>=1.25.0
89
+ requests==2.31.0
90
+ pydub==0.25.1
91
+ torch==1.11.0
92
+ torchaudio==0.11.0
93
+ speechbrain==0.5.12
94
+ transformers==4.29.2
95
+ asyncio==3.4.3
96
+ ffmpeg-python==0.2.0
97
+ openai-whisper==20230314
98
+ numpy==1.22.4
99
+ langchain>=0.1.0
100
+ langchain-community>=0.0.30
101
+ torchvision==0.12.0
102
+ langgraph>=0.0.20
103
+
104
+ ```
105
+
106
+ ---
107
+
108
+ ## Notes
109
+
110
+ - Gemma3 is accessed via **Ollama** inside Docker β€” ensure it pulls on build.
111
+ - `custome_interface.py` is required by the accent model β€” it’s automatically downloaded in Dockerfile.
112
+ - Video URLs must be **direct links** to `.mp4` files.
113
+
114
+ ---
115
+
116
+ ## Example Prompt
117
+
118
+ ```
119
+ Analyze this video: https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4
120
+ ```
121
+
122
+ Then follow up with:
123
+
124
+ ```
125
+ Where is the speaker probably from?
126
+ What is the tone or emotion?
127
+ Summarize the video?
128
+ ```
129
+
130
+ ---
131
+ ## Acknowledgments
132
+
133
+ This project uses the following models, frameworks, and tools:
134
+
135
+ - [OpenAI Whisper](https://github.com/openai/whisper): Automatic speech recognition model.
136
+ - [SpeechBrain](https://speechbrain.readthedocs.io/): Toolkit used for building and fine-tuning speech processing models.
137
+ - [Accent-ID CommonAccent](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english): Fine-tuned wav2vec2 model hosted on Hugging Face for English accent classification.
138
+ - [CustomEncoderWav2vec2Classifier](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english/blob/main/custom_interface.py): Custom interface used to load and run the accent model.
139
+ - [Gemma3](https://ollama.com/library/gemma3) via [Ollama](https://ollama.com): Large language model used for natural language follow-up based on transcripts.
140
+ - [Streamlit](https://streamlit.io): Python framework for building web applications.
141
+ - [Hugging Face Spaces](https://huggingface.co/spaces): Platform used for deploying this application on GPU infrastructure.
142
+
143
+
144
+ ---
145
+
146
+ ## Author
147
+
148
+ - Developed by [Aswathi T S](https://github.com/ash-171)
149
+
150
+ ---
151
+
152
+ ## License
153
+
154
+ This project is licensed under the `MIT License`.