File size: 3,687 Bytes
87337b1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
## Amazon Bedrock LLM Extension
### Configurations
You can config this extension by providing following environments:
| Env | Required | Default | Notes |
| -- | -- | -- | -- |
| AWS_REGION | No | us-east-1 | The Region of Amazon Bedrock service you want to use. |
| AWS_ACCESS_KEY_ID | No | - | Access Key of your IAM User, make sure you've set proper permissions to [invoke Bedrock models](https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html) and gain [models access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) in Bedrock. Will use default credentials provider if not provided. Check [document](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html). |
| AWS_SECRET_ACCESS_KEY | No | - | Secret Key of your IAM User, make sure you've set proper permissions to [invoke Bedrock models](https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html) and gain [models access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) in Bedrock. Will use default credentials provider if not provided. Check [document](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html). |
| AWS_BEDROCK_MODEL | No | Nova (https://docs.aws.amazon.com/nova/latest/userguide/what-is-nova.html) | Bedrock model id, check [docuement](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html#model-ids-arns). |
## Features
- Real-time video and audio interaction similar to Gemini 2.0
- Audio recognition using TEN framework's STT plugin
- Text-to-speech conversion using TEN framework's TTS plugin
- Integration with AWS Bedrock's Nova model
- Smart input truncation logic
- Multi-language support
## Requirements
- Python 3.9+
- AWS account with Bedrock access
- TEN framework with STT and TTS plugins
- Dependencies listed in requirements.txt
## Installation
1. Install dependencies:
```bash
pip install -r requirements.txt
```
2. Configure AWS credentials:
- Set up AWS credentials with Bedrock access
- Update the api_key in configuration
## Configuration
The extension can be configured through manifest.json properties:
- `base_uri`: Bedrock API endpoint
- `region`: AWS region for Bedrock
- `aws_access_key_id`: AWS access key ID
- `aws_secret_access_key`: AWS secret access key
- `model_id`: Bedrock Nova model ID
- `language`: Language code for STT/TTS
- See manifest.json for full configuration options
## Input Truncation Logic
The extension implements smart input truncation:
1. Duration-based truncation:
- Automatically truncates input exceeding 30 seconds
2. Silence-based truncation:
- Triggers when silence exceeds 2 seconds
3. Manual truncation:
- Supports user-initiated truncation
## Architecture
1. Audio Processing:
- Uses TEN framework's STT plugin for audio recognition
- Buffers and processes audio in real-time
- Provides intermediate and final transcripts
2. Nova Model Integration:
- Combines transcribed text with video input
- Sends to Bedrock's Nova model for processing
- Handles responses and error conditions
3. Speech Synthesis:
- Converts Nova model responses to speech
- Uses TEN framework's TTS plugin
- Synchronizes with video output
## API Usage
### Commands
1. Flush Command:
```python
cmd = Cmd.create("flush")
await ten_env.send_cmd(cmd)
```
2. User Events:
```python
# User joined
cmd = Cmd.create("on_user_joined")
await ten_env.send_cmd(cmd)
# User left
cmd = Cmd.create("on_user_left")
await ten_env.send_cmd(cmd)
```
## Contributing
1. Fork the repository
2. Create a feature branch
3. Submit a pull request |