File size: 3,558 Bytes
87337b1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
# gemini_v2v_python
An extension for integrating Gemini's Next Generation of **Multimodal** AI into your application, providing configurable AI-driven features such as conversational agents, task automation, and tool integration.
## Features
- Gemini **Multimodal** Integration: Leverage Gemini **Multimodal** models for voice-to-voice as well as text processing.
- Configurable: Easily customize API keys, model settings, prompts, temperature, etc.
- Async Queue Processing: Supports real-time message processing with task cancellation and prioritization.
## API
Refer to the `api` definition in [manifest.json] and default values in [property.json](property.json).
| **Property** | **Type** | **Description** |
|----------------------------|------------|-------------------------------------------|
| `api_key` | `string` | API key for authenticating with Gemini |
| `temperature` | `float32` | Sampling temperature, higher values mean more randomness |
| `model` | `string` | Model identifier (e.g., GPT-4, Gemini-1) |
| `max_tokens` | `int32` | Maximum number of tokens to generate |
| `system_message` | `string` | Default system message to send to the model |
| `voice` | `string` | Voice that Gemini model uses, such as `alloy`, `echo`, `shimmer`, etc. |
| `server_vad` | `bool` | Flag to enable or disable server VAD for Gemini |
| `language` | `string` | Language that Gemini model responds in, such as `en-US`, `zh-CN`, etc. |
| `dump` | `bool` | Flag to enable or disable audio dump for debugging purposes |
| `base_uri` | `string` | Base URI for connecting to the Gemini service |
| `audio_out` | `bool` | Flag to enable or disable audio output |
| `input_transcript` | `bool` | Flag to enable input transcript processing |
| `sample_rate` | `int32` | Sample rate for audio processing |
| `stream_id` | `int32` | Stream ID for identifying audio streams |
| `greeting` | `string` | Greeting message for initial interaction |
### Data Out
| **Name** | **Property** | **Type** | **Description** |
|----------------|--------------|------------|-------------------------------|
| `text_data` | `text` | `string` | Outgoing text data |
| `append` | `text` | `string` | Additional text appended to the output |
### Command Out
| **Name** | **Description** |
|----------------|---------------------------------------------|
| `flush` | Response after flushing the current state |
| `tool_call` | Invokes a tool with specific arguments |
### Audio Frame In
| **Name** | **Description** |
|------------------|-------------------------------------------|
| `pcm_frame` | Audio frame input for voice processing |
### Video Frame In
| **Name** | **Description** |
|------------------|-------------------------------------------|
| `video_frame` | Video frame input for processing |
### Audio Frame Out
| **Name** | **Description** |
|------------------|-------------------------------------------|
| `pcm_frame` | Audio frame output after voice processing |
|