blasisd commited on
Commit
93bb4f3
·
1 Parent(s): 6248a54

Initial commit

Browse files
Files changed (4) hide show
  1. README.md +186 -6
  2. configs/supported_languages.xlsx +0 -0
  3. requirements.txt +8 -0
  4. src/app.py +207 -0
README.md CHANGED
@@ -1,12 +1,192 @@
1
  ---
2
- title: Talk Globe
3
- emoji: 🐠
4
- colorFrom: green
5
- colorTo: purple
6
  sdk: gradio
7
- sdk_version: 5.27.0
8
- app_file: app.py
9
  pinned: false
 
 
 
10
  ---
11
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: TalkGlobe (Gradio UI)
3
+ emoji: 🗣️
4
+ colorFrom: purple
5
+ colorTo: red
6
  sdk: gradio
7
+ sdk_version: 5.26.0
8
+ app_file: src/app.py
9
  pinned: false
10
+ license: mit
11
+ short_description: Real-time translator with multilang support (Gradio UI)
12
+ tags: [webrtc, gradio]
13
  ---
14
 
15
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
16
+
17
+ # TalkGlobe: Real-Time Speech Translation
18
+
19
+ TalkGlobe is an AI-powered application that enables seamless, real-time speech-to-speech translation. Using the state-of-the-art Seamless-M4T-v2 model from Meta, it delivers:
20
+
21
+ - **🎙️ 101 input languages** for speech recognition
22
+
23
+ - **🔊 35 output languages** for natural-sounding translated speech
24
+
25
+ - **⚡ Instant translation** with low latency
26
+
27
+ - **🖥️ Intuitive interface** for effortless language selection
28
+
29
+ Simply speak in your native language, choose a target language, and TalkGlobe generates the translated audio in real time. Ideal for travel, business, or multilingual conversations.
30
+
31
+ ## Supported Languages:
32
+
33
+ Listed below, are the languages supported (either as source or target) by TalkGlobe (according to facebook/seamless-m4t-v2-large model card).
34
+
35
+ | code | language | Source | Target |
36
+ | -------- | ---------------------- | :----: | :----: |
37
+ | afr | Afrikaans | ✅ | ❌ |
38
+ | amh | Amharic | ✅ | ❌ |
39
+ | arb | Modern Standard Arabic | ✅ | ✅ |
40
+ | ary | Moroccan Arabic | ✅ | ❌ |
41
+ | arz | Egyptian Arabic | ✅ | ❌ |
42
+ | asm | Assamese | ✅ | ❌ |
43
+ | ast | Asturian | ✅ | ❌ |
44
+ | azj | North Azerbaijani | ✅ | ❌ |
45
+ | bel | Belarusian | ✅ | ❌ |
46
+ | ben | Bengali | ✅ | ✅ |
47
+ | bos | Bosnian | ✅ | ❌ |
48
+ | bul | Bulgarian | ✅ | ❌ |
49
+ | cat | Catalan | ✅ | ✅ |
50
+ | ceb | Cebuano | ✅ | ❌ |
51
+ | ces | Czech | ✅ | ✅ |
52
+ | ckb | Central Kurdish | ✅ | ❌ |
53
+ | cmn | Mandarin Chinese | ✅ | ✅ |
54
+ | cmn_Hant | Mandarin Chinese | ✅ | ✅ |
55
+ | cym | Welsh | ✅ | ✅ |
56
+ | dan | Danish | ✅ | ✅ |
57
+ | deu | German | ✅ | ✅ |
58
+ | ell | Greek | ✅ | ❌ |
59
+ | eng | English | ✅ | ✅ |
60
+ | est | Estonian | ✅ | ✅ |
61
+ | eus | Basque | ✅ | ❌ |
62
+ | fin | Finnish | ✅ | ✅ |
63
+ | fra | French | ✅ | ✅ |
64
+ | fuv | Nigerian Fulfulde | ✅ | ❌ |
65
+ | gaz | West Central Oromo | ✅ | ❌ |
66
+ | gle | Irish | ✅ | ❌ |
67
+ | glg | Galician | ✅ | ❌ |
68
+ | guj | Gujarati | ✅ | ❌ |
69
+ | heb | Hebrew | ✅ | ❌ |
70
+ | hin | Hindi | ✅ | ✅ |
71
+ | hrv | Croatian | ✅ | ❌ |
72
+ | hun | Hungarian | ✅ | ❌ |
73
+ | hye | Armenian | ✅ | ❌ |
74
+ | ibo | Igbo | ✅ | ❌ |
75
+ | ind | Indonesian | ✅ | ✅ |
76
+ | isl | Icelandic | ✅ | ❌ |
77
+ | ita | Italian | ✅ | ✅ |
78
+ | jav | Javanese | ✅ | ❌ |
79
+ | jpn | Japanese | ✅ | ✅ |
80
+ | kam | Kamba | ✅ | ❌ |
81
+ | kan | Kannada | ✅ | ❌ |
82
+ | kat | Georgian | ✅ | ❌ |
83
+ | kaz | Kazakh | ✅ | ❌ |
84
+ | kea | Kabuverdianu | ✅ | ❌ |
85
+ | khk | Halh Mongolian | ✅ | ❌ |
86
+ | khm | Khmer | ✅ | ❌ |
87
+ | kir | Kyrgyz | ✅ | ❌ |
88
+ | kor | Korean | ✅ | ✅ |
89
+ | lao | Lao | ✅ | ❌ |
90
+ | lit | Lithuanian | ✅ | ❌ |
91
+ | ltz | Luxembourgish | ✅ | ❌ |
92
+ | lug | Ganda | ✅ | ❌ |
93
+ | luo | Luo | ✅ | ❌ |
94
+ | lvs | Standard Latvian | ✅ | ❌ |
95
+ | mai | Maithili | ✅ | ❌ |
96
+ | mal | Malayalam | ✅ | ❌ |
97
+ | mar | Marathi | ✅ | ❌ |
98
+ | mkd | Macedonian | ✅ | ❌ |
99
+ | mlt | Maltese | ✅ | ✅ |
100
+ | mni | Meitei | ✅ | ❌ |
101
+ | mya | Burmese | ✅ | ❌ |
102
+ | nld | Dutch | ✅ | ✅ |
103
+ | nno | Norwegian Nynorsk | ✅ | ❌ |
104
+ | nob | Norwegian Bokmål | ✅ | ❌ |
105
+ | npi | Nepali | ✅ | ❌ |
106
+ | nya | Nyanja | ✅ | ❌ |
107
+ | oci | Occitan | ✅ | ❌ |
108
+ | ory | Odia | ✅ | ❌ |
109
+ | pan | Punjabi | ✅ | ❌ |
110
+ | pbt | Southern Pashto | ✅ | ❌ |
111
+ | pes | Western Persian | ✅ | ✅ |
112
+ | pol | Polish | ✅ | ✅ |
113
+ | por | Portuguese | ✅ | ✅ |
114
+ | ron | Romanian | ✅ | ✅ |
115
+ | rus | Russian | ✅ | ✅ |
116
+ | slk | Slovak | ✅ | ✅ |
117
+ | slv | Slovenian | ✅ | ❌ |
118
+ | sna | Shona | ✅ | ❌ |
119
+ | snd | Sindhi | ✅ | ❌ |
120
+ | som | Somali | ✅ | ❌ |
121
+ | spa | Spanish | ✅ | ✅ |
122
+ | srp | Serbian | ✅ | ❌ |
123
+ | swe | Swedish | ✅ | ✅ |
124
+ | swh | Swahili | ✅ | ✅ |
125
+ | tam | Tamil | ✅ | ❌ |
126
+ | tel | Telugu | ✅ | ✅ |
127
+ | tgk | Tajik | ✅ | ❌ |
128
+ | tgl | Tagalog | ✅ | ✅ |
129
+ | tha | Thai | ✅ | ✅ |
130
+ | tur | Turkish | ✅ | ✅ |
131
+ | ukr | Ukrainian | ✅ | ✅ |
132
+ | urd | Urdu | ✅ | ✅ |
133
+ | uzn | Northern Uzbek | ✅ | ✅ |
134
+ | vie | Vietnamese | ✅ | ✅ |
135
+ | xho | Xhosa | ✅ | ❌ |
136
+ | yor | Yoruba | ✅ | ❌ |
137
+ | yue | Cantonese | ✅ | ❌ |
138
+ | zlm | Colloquial Malay | ✅ | ❌ |
139
+ | zul | Zulu | ✅ | ❌ |
140
+
141
+ ## Getting Started
142
+
143
+ This guide provides step-by-step instructions to set up and run the project on your local machine for development and testing purposes. For details on deploying the project to a production environment, refer to the Deployment section.
144
+
145
+ ### Prerequisites
146
+
147
+ To set up and run this project, ensure the following software and tools are installed on your system:
148
+
149
+ - **Python**: Version `3.10.12` or higher is required. Verify your Python version by running:
150
+
151
+ ```bash
152
+ python3 --version
153
+ ```
154
+
155
+ - **Dependencies**: Install the required Python packages listed in requirements.txt using pip. Run the following command in your terminal:
156
+
157
+ ```bash
158
+ pip install -r requirements.txt
159
+ ```
160
+
161
+ ### Local Development and Testing
162
+
163
+ To run the application locally for development and testing purposes, execute the following command in your terminal:
164
+
165
+ ```bash
166
+ python app.py
167
+ ```
168
+
169
+ > [!WARNING]
170
+ > Ensure you are in the project's **src** directory before running the script or adapt running path.
171
+
172
+ ## Deployment
173
+
174
+ ### Deployment on Hugging Face Spaces
175
+
176
+ To deploy the project on Hugging Face Spaces, follow these steps:
177
+
178
+ 1. Create an account on [Hugging Face](https://huggingface.co) if you don’t already have one.
179
+
180
+ 2. Refer to the official [Spaces Overview](https://huggingface.co/docs/hub/en/spaces-overview) documentation for detailed instructions on setting up and deploying your project.
181
+
182
+ ### Deployment on Other Cloud Platforms
183
+
184
+ For deployment on other cloud or live systems, consult the documentation provided by your chosen service provider. Each platform may have specific requirements and steps for deploying Python-based applications.
185
+
186
+ ## Built With
187
+
188
+ - [Python 3.10.12](http://www.python.org/) - Developing with the best programming language
189
+
190
+ ## Authors
191
+
192
+ **Vlasios Dimitriadis** - _Initial work:_ [TalkGlobe](https://huggingface.co/spaces/blasisd/talk-globe)
configs/supported_languages.xlsx ADDED
Binary file (12.6 kB). View file
 
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ fastrtc
2
+ fastrtc[vad]
3
+ openpyxl
4
+ protobuf
5
+ scipy
6
+ sentencepiece
7
+ torchaudio
8
+ transformers
src/app.py ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+
3
+ import pandas as pd
4
+
5
+ import torchaudio
6
+ import torch
7
+ import numpy as np
8
+
9
+ import gradio as gr
10
+
11
+ from fastrtc import WebRTC, ReplyOnPause
12
+ from transformers import AutoProcessor, SeamlessM4Tv2Model
13
+
14
+
15
+ parent_dir = Path(__file__).parents[1]
16
+ config_path = Path(parent_dir, "configs")
17
+
18
+ processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
19
+ model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
20
+ default_sampling_rate = 16_000
21
+
22
+
23
+ def translate_audio(
24
+ audio: tuple[int, np.ndarray], tgt_language: str
25
+ ) -> tuple[int, np.ndarray]:
26
+ """Translate the audio that is captured through the streaming component.
27
+ Source language of the audio has to be one of the supported languages to be successful.
28
+
29
+ :param audio: the captured audio
30
+ :type audio: tuple[int, np.ndarray]
31
+ :param tgt_language: the target language for translation
32
+ :type tgt_language: str
33
+ :yield: the tuple containing the sampling rate and the audio array
34
+ :rtype: tuple[int, np.ndarray]
35
+ """
36
+ orig_freq, np_array = audio
37
+ waveform = torch.from_numpy(np_array)
38
+ waveform = waveform.to(torch.float32)
39
+ waveform = waveform / 32768.0 # normalize int16 to [-1, 1]
40
+
41
+ audio = torchaudio.functional.resample(
42
+ waveform, orig_freq=orig_freq, new_freq=default_sampling_rate
43
+ ) # must be a 16 kHz waveform array
44
+
45
+ audio_inputs = processor(
46
+ audios=audio,
47
+ return_tensors="pt",
48
+ sampling_rate=default_sampling_rate,
49
+ )
50
+
51
+ audio_array_from_audio = (
52
+ model.generate(**audio_inputs, tgt_lang=tgt_language)[0].cpu().numpy().squeeze()
53
+ )
54
+
55
+ yield (default_sampling_rate, audio_array_from_audio)
56
+
57
+
58
+ # Supported target languages for speech
59
+ supported_langs_df = pd.read_excel(Path(config_path, "supported_languages.xlsx"))
60
+ supported_speech_langs_df = supported_langs_df[
61
+ supported_langs_df["Target"].str.contains("Sp")
62
+ ]
63
+
64
+ # Labels and values for supported speech languages dropdown
65
+ supported_speech_langs = list(
66
+ zip(supported_speech_langs_df["language"], supported_speech_langs_df["code"])
67
+ )
68
+
69
+ # Sort by the first element of the tuple (full language name)
70
+ supported_speech_langs.sort()
71
+
72
+ css = """
73
+ #componentsContainer {
74
+ width: 70%;
75
+ display: block;
76
+ margin-left: auto;
77
+ margin-right: auto;
78
+ }
79
+
80
+ #langDropdown .container .wrap {
81
+ width: 230px;
82
+ }
83
+
84
+ .audio-container {
85
+ padding-bottom: 2rem !important;
86
+ margin-bottom: 2rem !important;
87
+ }
88
+
89
+ .vspace-sm { margin-bottom: 20px !important; }
90
+ .vspace-md { margin-bottom: 40px !important; }
91
+ .vspace-lg { margin-bottom: 60px !important; }
92
+
93
+ .tagline {
94
+ color: #4a5568;
95
+ }
96
+ .tagline-emphasis {
97
+ font-family: 'Playfair Display', serif;
98
+ font-style: italic;
99
+ color: #718096;
100
+ position: relative;
101
+ display: inline-block;
102
+ }
103
+ .tagline-emphasis:after {
104
+ content: "";
105
+ position: absolute;
106
+ bottom: -5px;
107
+ left: 0;
108
+ width: 100%;
109
+ height: 2px;
110
+ background: linear-gradient(90deg, transparent, #6a11cb, transparent);
111
+ }
112
+
113
+ .gradio-footer {
114
+ position: fixed;
115
+ bottom: 0;
116
+ left: 0;
117
+ right: 0;
118
+ text-align: center;
119
+ padding: 12px;
120
+ background: var(--background-fill-secondary);
121
+ border-top: 1px solid var(--border-color-primary);
122
+ font-size: 0.9em;
123
+ z-index: 100;
124
+ display: flex;
125
+ justify-content: center;
126
+ align-items: center;
127
+ gap: 6px;
128
+ }
129
+ .gradio-footer a {
130
+ display: inline-flex;
131
+ align-items: center;
132
+ gap: 4px;
133
+ color: var(--link-text-color);
134
+ text-decoration: none;
135
+ }
136
+
137
+ .fastrtc-icon {
138
+ height: 24px;
139
+ width: 24px;
140
+ }
141
+ """
142
+
143
+ with gr.Blocks(
144
+ theme=gr.themes.Glass(),
145
+ css=css,
146
+ ) as demo:
147
+ gr.HTML(
148
+ """
149
+ <div style='display: flex; align-items: center; justify-content: center; gap: 20px'>
150
+ <div style="background-color: var(--block-background-fill); border-radius: 8px">
151
+ <img src="https://images.icon-icons.com/3975/PNG/512/translation_language_translator_icon_251869.png" style="width: 100px; height: 100px;">
152
+ </div>
153
+ <div>
154
+ <h1>TalkGlobe</h1>
155
+ <p class="tagline">
156
+ Break language barriers in real-time <span class="globe-icon">🌍</span><br>
157
+ <span class="tagline-emphasis">no more lost in translation</span> <span class="globe-icon">✨</span>
158
+ </p>
159
+ </div>
160
+ </div>
161
+ """,
162
+ elem_classes="vspace-sm",
163
+ )
164
+
165
+ # The main components (translation language dropdown and streaming capture component)
166
+ with gr.Group(elem_id="componentsContainer"):
167
+ with gr.Row(equal_height=True, min_height="11rem"):
168
+ with gr.Column(scale=5, elem_id="langCol"):
169
+ target_lang = gr.Dropdown(
170
+ choices=supported_speech_langs,
171
+ value="eng",
172
+ label="Supported Languages",
173
+ info="Select one of the supported languages for translation",
174
+ elem_id="langDropdown",
175
+ )
176
+
177
+ with gr.Column(scale=5, elem_id="micCol"):
178
+ audio = WebRTC(
179
+ modality="audio",
180
+ mode="send-receive",
181
+ label="Audio Stream",
182
+ )
183
+
184
+ # Trigger on pause
185
+ audio.stream(
186
+ ReplyOnPause(translate_audio),
187
+ inputs=[audio, target_lang],
188
+ outputs=[audio],
189
+ )
190
+
191
+ # Sticky footer (will stay at bottom on all screen sizes)
192
+ gr.HTML(
193
+ """
194
+ <div class="gradio-footer">
195
+ Powered by
196
+ <a href="https://gradio.app/" target="_blank">
197
+ Gradio <img class="gradio-icon" src="https://www.gradio.app/_app/immutable/assets/gradio.CHB5adID.svg" alt="GradioIcon" style="height:24px; width:auto;">
198
+ </a>
199
+
200
+ <a href="https://freddyaboulton.github.io/gradio-webrtc/" target="_blank">
201
+ FastRTC <img class="fastrtc-icon" src="https://fastrtc.org/fastrtc_logo.png" alt="FastRTCIcon">
202
+ </a>
203
+ </div>
204
+ """
205
+ )
206
+
207
+ demo.launch()