openai_v2v_python
An extension for integrating OpenAI's Next Generation of Multimodal AI into your application, providing configurable AI-driven features such as conversational agents, task automation, and tool integration.
Features
- OpenAI Multimodal Integration: Leverage GPT Multimodal models for voice to voice as well as text processing.
- Configurable: Easily customize API keys, model settings, prompts, temperature, etc.
- Async Queue Processing: Supports real-time message processing with task cancellation and prioritization.
API
Refer to api
definition in [manifest.json] and default values in property.json.
Property |
Type |
Description |
api_key |
string |
API key for authenticating with OpenAI |
temperature |
float64 |
Sampling temperature, higher values mean more randomness |
model |
string |
Model identifier (e.g., GPT-3.5, GPT-4) |
max_tokens |
int64 |
Maximum number of tokens to generate |
system_message |
string |
Default system message to send to the model |
voice |
string |
Voice that OpenAI model speeches, such as alloy , echo , shimmer , etc |
server_vad |
bool |
Flag to enable or disable server vad of OpenAI |
language |
string |
Language that OpenAO model reponds, such as en-US , zh-CN , etc |
dump |
bool |
Flag to enable or disable audio dump for debugging purpose |
Data Out:
Name |
Property |
Type |
Description |
text_data |
text |
string |
Outgoing text data |
Command Out:
Name |
Description |
flush |
Response after flushing the current state |
Audio Frame In:
Name |
Description |
pcm_frame |
Audio frame input for voice processing |
Audio Frame Out:
Name |
Description |
pcm_frame |
Audio frame output after voice processing |
Azure Support
This extension also support Azure OpenAI Service, the propoerty settings are as follow:
{
"base_uri": "wss://xxx.openai.azure.com",
"path": "/openai/realtime?api-version=xxx&deployment=xxx",
"api_key": "xxx",
"model": "gpt-4o-realtime-preview",
"vendor": "azure"
}