Spaces:
Sleeping
Sleeping
Hi, I'm Freddy and I want to give a tour of FastRTC - the real-time communication library for Python. | |
Why is this important? In the last few months, we've seen many advances in real-time speech and vision models coming from closed-source models, open-source models, and API providers. | |
Despite these innovations, it's still difficult to build real-time AI applications that stream audio and video, especially in Python. This is because: | |
- ML engineers may not have experience with the technologies needed to build real-time applications, such as WebRTC or Websockets. | |
- Implementing algorithms for voice detection and turn taking is tricky! | |
- Best practices are scattered across various sources and even code assistant tools like Cursor and Copilot struggle to write Python code that supports real-time audio/video applications. I learned that the hard way! | |
All this means that if you want to take advantage of the latest advances in AI, you have to spend a lot of time figuring out how to do real-time streaming. | |
`FastRTC` solves this problem by automatically turning any python function into a real-time audio and video stream over WebRTC or WebSockets with little additional code or overhead. Let's see how it works. | |
Let's start with the basics - echoing audio. | |
In FastRTC, you can wrap any iterator with `ReplyOnPause` and pass it to the `Stream` class. | |
This will create a WebRTC-powered web server that handles voice detection and turn taking - you just worry about the logic for the generating the response. | |
Each stream comes with a built-in webRTC-powered Gradio UI that you can use for testing. | |
Simply call `ui.launch()`. Let's see it in action. | |
We can level up our application by having an LLM generate the response. | |
We'll import the SambaNova API as well as some FastRTC utils for doing speech-to-text and text-to-speech and then pipe them all together. | |
Importantly, you can use any LLM, speech-to-text, or text-to-speech model. Even an audio-to-audio model. | |
Bring the tools you love and we'll just handle the real-time communication. | |
You can also call into the stream for FREE if you have a Hugging Face Token. | |
Finally, deployment is really easy too. You can stick with Gradio or mount the stream in a FastAPI app and build any application you want. By the way, video is supported too! | |
Thanks for watching! Please visit fastrtc.org to see the cookbook for all the demos shown here as well as the docs. | |