# gemini_v2v_python

An extension for integrating Gemini's Next Generation of **Multimodal** AI into your application, providing configurable AI-driven features such as conversational agents, task automation, and tool integration.

## Features

- Gemini **Multimodal** Integration: Leverage Gemini **Multimodal** models for voice-to-voice as well as text processing.
- Configurable: Easily customize API keys, model settings, prompts, temperature, etc.
- Async Queue Processing: Supports real-time message processing with task cancellation and prioritization.

## API

Refer to the `api` definition in [manifest.json] and default values in [property.json](property.json).

| **Property**               | **Type**   | **Description**                           |
|----------------------------|------------|-------------------------------------------|
| `api_key`                   | `string`   | API key for authenticating with Gemini    |
| `temperature`               | `float32`  | Sampling temperature, higher values mean more randomness |
| `model`                     | `string`   | Model identifier (e.g., GPT-4, Gemini-1)  |
| `max_tokens`                | `int32`    | Maximum number of tokens to generate      |
| `system_message`            | `string`   | Default system message to send to the model |
| `voice`                     | `string`   | Voice that Gemini model uses, such as `alloy`, `echo`, `shimmer`, etc. |
| `server_vad`                | `bool`     | Flag to enable or disable server VAD for Gemini |
| `language`                  | `string`   | Language that Gemini model responds in, such as `en-US`, `zh-CN`, etc. |
| `dump`                      | `bool`     | Flag to enable or disable audio dump for debugging purposes |
| `base_uri`                  | `string`   | Base URI for connecting to the Gemini service |
| `audio_out`                 | `bool`     | Flag to enable or disable audio output    |
| `input_transcript`          | `bool`     | Flag to enable input transcript processing |
| `sample_rate`               | `int32`    | Sample rate for audio processing          |
| `stream_id`                 | `int32`    | Stream ID for identifying audio streams   |
| `greeting`                  | `string`   | Greeting message for initial interaction  |

### Data Out

| **Name**       | **Property** | **Type**   | **Description**               |
|----------------|--------------|------------|-------------------------------|
| `text_data`    | `text`       | `string`   | Outgoing text data             |
| `append`       | `text`       | `string`   | Additional text appended to the output |

### Command Out

| **Name**       | **Description**                             |
|----------------|---------------------------------------------|
| `flush`        | Response after flushing the current state    |
| `tool_call`    | Invokes a tool with specific arguments       |

### Audio Frame In

| **Name**         | **Description**                           |
|------------------|-------------------------------------------|
| `pcm_frame`      | Audio frame input for voice processing    |

### Video Frame In

| **Name**         | **Description**                           |
|------------------|-------------------------------------------|
| `video_frame`    | Video frame input for processing          |

### Audio Frame Out

| **Name**         | **Description**                           |
|------------------|-------------------------------------------|
| `pcm_frame`      | Audio frame output after voice processing |