AI_Avatar_Chat / API_DOCUMENTATION.md
bravedims
Replace ElevenLabs with HuggingFace TTS (SpeechT5)
8be8b4b
ο»Ώ# πŸ”Œ OmniAvatar API Documentation
## POST /generate - Avatar Generation
### Request Format
**URL:** `https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate`
**Method:** `POST`
**Content-Type:** `application/json`
### Request Body (JSON)
```json
{
"prompt": "string",
"text_to_speech": "string (optional)",
"elevenlabs_audio_url": "string (optional)",
"voice_id": "string (optional, default: '21m00Tcm4TlvDq8ikWAM')",
"image_url": "string (optional)",
"guidance_scale": "float (default: 5.0)",
"audio_scale": "float (default: 3.0)",
"num_steps": "int (default: 30)",
"sp_size": "int (default: 1)",
"tea_cache_l1_thresh": "float (optional)"
}
```
### Request Parameters
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `prompt` | string | βœ… | Character behavior description |
| `text_to_speech` | string | ❌ | Text to convert to speech via ElevenLabs |
| `elevenlabs_audio_url` | string | ❌ | Direct URL to audio file |
| `voice_id` | string | ❌ | ElevenLabs voice ID (default: Rachel) |
| `image_url` | string | ❌ | Reference image URL |
| `guidance_scale` | float | ❌ | Prompt following strength (4-6 recommended) |
| `audio_scale` | float | ❌ | Lip-sync accuracy (3-5 recommended) |
| `num_steps` | int | ❌ | Generation steps (20-50 recommended) |
| `sp_size` | int | ❌ | Parallel processing size |
| `tea_cache_l1_thresh` | float | ❌ | Cache threshold optimization |
**Note:** Either `text_to_speech` OR `elevenlabs_audio_url` must be provided.
### Example Request
```json
{
"prompt": "A professional teacher explaining a mathematical concept with clear gestures",
"text_to_speech": "Hello students! Today we're going to learn about calculus and how derivatives work in real life.",
"voice_id": "21m00Tcm4TlvDq8ikWAM",
"image_url": "https://example.com/teacher.jpg",
"guidance_scale": 5.0,
"audio_scale": 3.5,
"num_steps": 30
}
```
### Response Format
**Success Response (200 OK):**
```json
{
"message": "string",
"output_path": "string",
"processing_time": "float",
"audio_generated": "boolean"
}
```
### Response Fields
| Field | Type | Description |
|-------|------|-------------|
| `message` | string | Success/status message |
| `output_path` | string | Path to generated video file |
| `processing_time` | float | Processing time in seconds |
| `audio_generated` | boolean | Whether audio was generated from text |
### Example Response
```json
{
"message": "Avatar generation completed successfully",
"output_path": "./outputs/avatar_20240807_130512.mp4",
"processing_time": 45.67,
"audio_generated": true
}
```
### Error Responses
**400 Bad Request:**
```json
{
"detail": "Either text_to_speech or elevenlabs_audio_url must be provided"
}
```
**500 Internal Server Error:**
```json
{
"detail": "Model not loaded"
}
```
**503 Service Unavailable:**
```json
{
"detail": "Model not loaded"
}
```
### Available ElevenLabs Voices
| Voice ID | Name | Description |
|----------|------|-------------|
| `21m00Tcm4TlvDq8ikWAM` | Rachel | Default, clear female voice |
| `pNInz6obpgDQGcFmaJgB` | Adam | Professional male voice |
| `EXAVITQu4vr4xnSDxMaL` | Bella | Expressive female voice |
### Usage Examples
#### With Text-to-Speech
```bash
curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A friendly presenter speaking confidently",
"text_to_speech": "Welcome to our AI avatar demonstration!",
"voice_id": "21m00Tcm4TlvDq8ikWAM",
"guidance_scale": 5.5,
"audio_scale": 4.0
}'
```
#### With Audio URL
```bash
curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A news anchor delivering headlines",
"elevenlabs_audio_url": "https://example.com/audio.mp3",
"image_url": "https://example.com/anchor.jpg",
"num_steps": 40
}'
```
### Other Endpoints
#### GET /health - Health Check
```json
{
"status": "healthy",
"model_loaded": true,
"device": "cuda",
"supports_elevenlabs": true,
"supports_image_urls": true,
"supports_text_to_speech": true,
"elevenlabs_api_configured": true
}
```
#### GET /docs - FastAPI Documentation
Interactive API documentation available at `/docs` endpoint.
### Rate Limits & Performance
- **Processing Time:** 30-120 seconds depending on complexity
- **Max Video Length:** Determined by audio length
- **Supported Formats:** MP4 output, MP3/WAV audio input
- **GPU Acceleration:** Enabled on T4+ hardware
---
**Live API Base URL:** `https://huggingface.co/spaces/bravedims/AI_Avatar_Chat`