File size: 1,962 Bytes
8d5a127
 
 
 
 
 
 
 
f453b83
8d5a127
75120e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
license: apache-2.0
title: teste-teste-teste
sdk: gradio
emoji: πŸƒ
colorFrom: red
colorTo: red
short_description: yadayadayada
sdk_version: 5.33.0
---
# Real-Time Screen Assistant - Premium Edition

This is a premium real-time screen assistant that integrates Google's Gemini 2.0 Live API with advanced screen recording capabilities.

## Features

- πŸŽ™οΈ **Real-time Audio Streaming** - Voice activity detection with noise filtering
- πŸ–₯️ **Professional Screen Recording** - Native ScreenRecorder component with webcam overlay
- πŸ€– **AI Voice Responses** - Bidirectional audio communication with Gemini 2.0
- πŸ“ **Text Response Display** - Real-time text responses with conversation history
- πŸ”„ **Background Task Management** - Proper async handling and cleanup
- πŸ“Š **Performance Monitoring** - Real-time stats and adaptive quality

## Setup

1. Set your Google AI API key:
   ```bash
   export GEMINI_API_KEY="your-api-key-here"
   ```

2. Install dependencies (automatic on HuggingFace Spaces):
   ```bash
   pip install -r requirements.txt
   ```

3. Run the application:
   ```bash
   python app.py
   ```

## Components

- **app.py** - Main application with premium real-time integration
- **gradio_screenrecorder/** - Custom Gradio component for screen recording
- **requirements.txt** - All necessary dependencies including custom components

## Environment Variables

- `GEMINI_API_KEY` - Required: Your Google AI API key for Gemini 2.0 Live API

## Real-time Integration

This application implements complete real-time frontend integration:

1. **Continuous Audio Flow** (User β†’ Model) - Voice activity detection
2. **Model Audio Output** (Model β†’ User) - AI voice responses 
3. **Screen Recording Integration** - Professional screen capture
4. **Text Response Delivery** (System β†’ User) - Real-time text display

All features are optimized for 300-second real-time sessions with adaptive quality and intelligent throttling.