freddyaboulton HF Staff commited on
Commit
5c844ed
·
verified ·
1 Parent(s): 82494f8

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +56 -9
app.py CHANGED
@@ -1,7 +1,8 @@
1
 
2
- import gradio as gr
3
  import os
4
 
 
 
5
  _docs = {'WebRTC':
6
  {'description': 'Stream audio/video with WebRTC',
7
  'members': {'__init__':
@@ -40,7 +41,7 @@ with gr.Blocks(
40
  <h1 style='text-align: center; margin-bottom: 1rem'> Gradio WebRTC ⚡️ </h1>
41
 
42
  <div style="display: flex; flex-direction: row; justify-content: center">
43
- <img style="display: block; padding-right: 5px; height: 20px;" alt="Static Badge" src="https://img.shields.io/badge/version%20-%200.0.5%20-%20orange">
44
  <a href="https://github.com/freddyaboulton/gradio-webrtc" target="_blank"><img alt="Static Badge" src="https://img.shields.io/badge/github-white?logo=github&logoColor=black"></a>
45
  </div>
46
  """, elem_classes=["md-custom"], header_links=True)
@@ -56,15 +57,15 @@ pip install gradio_webrtc
56
  1. [Object Detection from Webcam with YOLOv10](https://huggingface.co/spaces/freddyaboulton/webrtc-yolov10n) 📷
57
  2. [Streaming Object Detection from Video with RT-DETR](https://huggingface.co/spaces/freddyaboulton/rt-detr-object-detection-webrtc) 🎥
58
  3. [Text-to-Speech](https://huggingface.co/spaces/freddyaboulton/parler-tts-streaming-webrtc) 🗣️
 
59
 
60
  ## Usage
61
 
62
  The WebRTC component supports the following three use cases:
63
- 1. Streaming video from the user webcam to the server and back
64
- 2. Streaming Video from the server to the client
65
- 3. Streaming Audio from the server to the client
66
-
67
- Streaming Audio from client to the server and back (conversational AI) is not supported yet.
68
 
69
 
70
  ## Streaming Video from the User Webcam to the Server and Back
@@ -104,7 +105,7 @@ as a **numpy array** and returns the processed frame also as a **numpy array**.
104
  * The `inputs` parameter should be a list where the first element is the WebRTC component. The only output allowed is the WebRTC component.
105
  * The `time_limit` parameter is the maximum time in seconds the video stream will run. If the time limit is reached, the video stream will stop.
106
 
107
- ## Streaming Video from the User Webcam to the Server and Back
108
 
109
  ```python
110
  import gradio as gr
@@ -169,6 +170,52 @@ with gr.Blocks() as demo:
169
  * The numpy array should be of shape (1, num_samples).
170
  * The `outputs` parameter should be a list with the WebRTC component as the only element.
171
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
  ## Deployment
173
 
174
  When deploying in a cloud environment (like Hugging Face Spaces, EC2, etc), you need to set up a TURN server to relay the WebRTC traffic.
@@ -241,4 +288,4 @@ with gr.Blocks() as demo:
241
 
242
  """)
243
 
244
- demo.launch()
 
1
 
 
2
  import os
3
 
4
+ import gradio as gr
5
+
6
  _docs = {'WebRTC':
7
  {'description': 'Stream audio/video with WebRTC',
8
  'members': {'__init__':
 
41
  <h1 style='text-align: center; margin-bottom: 1rem'> Gradio WebRTC ⚡️ </h1>
42
 
43
  <div style="display: flex; flex-direction: row; justify-content: center">
44
+ <img style="display: block; padding-right: 5px; height: 20px;" alt="Static Badge" src="https://img.shields.io/badge/version%20-%200.0.6%20-%20orange">
45
  <a href="https://github.com/freddyaboulton/gradio-webrtc" target="_blank"><img alt="Static Badge" src="https://img.shields.io/badge/github-white?logo=github&logoColor=black"></a>
46
  </div>
47
  """, elem_classes=["md-custom"], header_links=True)
 
57
  1. [Object Detection from Webcam with YOLOv10](https://huggingface.co/spaces/freddyaboulton/webrtc-yolov10n) 📷
58
  2. [Streaming Object Detection from Video with RT-DETR](https://huggingface.co/spaces/freddyaboulton/rt-detr-object-detection-webrtc) 🎥
59
  3. [Text-to-Speech](https://huggingface.co/spaces/freddyaboulton/parler-tts-streaming-webrtc) 🗣️
60
+ 4. [Conversational AI](https://huggingface.co/spaces/freddyaboulton/omni-mini-webrtc) 🤖🗣️
61
 
62
  ## Usage
63
 
64
  The WebRTC component supports the following three use cases:
65
+ 1. [Streaming video from the user webcam to the server and back](#h-streaming-video-from-the-user-webcam-to-the-server-and-back)
66
+ 2. [Streaming Video from the server to the client](#h-streaming-video-from-the-server-to-the-client)
67
+ 3. [Streaming Audio from the server to the client](#h-streaming-audio-from-the-server-to-the-client)
68
+ 4. [Streaming Audio from the client to the server and back (conversational AI)](#h-conversational-ai)
 
69
 
70
 
71
  ## Streaming Video from the User Webcam to the Server and Back
 
105
  * The `inputs` parameter should be a list where the first element is the WebRTC component. The only output allowed is the WebRTC component.
106
  * The `time_limit` parameter is the maximum time in seconds the video stream will run. If the time limit is reached, the video stream will stop.
107
 
108
+ ## Streaming Video from the server to the client
109
 
110
  ```python
111
  import gradio as gr
 
170
  * The numpy array should be of shape (1, num_samples).
171
  * The `outputs` parameter should be a list with the WebRTC component as the only element.
172
 
173
+ ## Conversational AI
174
+
175
+ ```python
176
+ import gradio as gr
177
+ import numpy as np
178
+ from gradio_webrtc import WebRTC, StreamHandler
179
+ from queue import Queue
180
+ import time
181
+
182
+
183
+ class EchoHandler(StreamHandler):
184
+ def __init__(self) -> None:
185
+ super().__init__()
186
+ self.queue = Queue()
187
+
188
+ def receive(self, frame: tuple[int, np.ndarray] | np.ndarray) -> None:
189
+ self.queue.put(frame)
190
+
191
+ def emit(self) -> None:
192
+ return self.queue.get()
193
+
194
+
195
+ with gr.Blocks() as demo:
196
+ with gr.Column():
197
+ with gr.Group():
198
+ audio = WebRTC(
199
+ label="Stream",
200
+ rtc_configuration=None,
201
+ mode="send-receive",
202
+ modality="audio",
203
+ )
204
+
205
+ audio.stream(fn=EchoHandler(), inputs=[audio], outputs=[audio], time_limit=15)
206
+
207
+
208
+ if __name__ == "__main__":
209
+ demo.launch()
210
+ ```
211
+
212
+ * Instead of passing a function to the `stream` event's `fn` parameter, pass a `StreamHandler` implementation. The `StreamHandler` above simply echoes the audio back to the client.
213
+ * The `StreamHandler` class has two methods: `receive` and `emit`. The `receive` method is called when a new frame is received from the client, and the `emit` method returns the next frame to send to the client.
214
+ * An audio frame is represented as a tuple of (frame_rate, audio_samples) where `audio_samples` is a numpy array of shape (num_channels, num_samples).
215
+ * You can also specify the audio layout ("mono" or "stereo") in the emit method by retuning it as the third element of the tuple. If not specified, the default is "mono".
216
+ * The `time_limit` parameter is the maximum time in seconds the conversation will run. If the time limit is reached, the audio stream will stop.
217
+ * The `emit` method SHOULD NOT block. If a frame is not ready to be sent, the method should return None.
218
+
219
  ## Deployment
220
 
221
  When deploying in a cloud environment (like Hugging Face Spaces, EC2, etc), you need to set up a TURN server to relay the WebRTC traffic.
 
288
 
289
  """)
290
 
291
+ demo.launch()