Spaces:

bravedims
/

AI_Avatar_Chat

Running

App Files Files Community

AI_Avatar_Chat / API_DOCUMENTATION.md

bravedims

Replace ElevenLabs with HuggingFace TTS (SpeechT5)

8be8b4b 8 days ago

preview code

raw

history blame contribute delete

4.72 kB

	# 🔌 OmniAvatar API Documentation

	## POST /generate - Avatar Generation

	### Request Format

	URL: `https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate`
	Method: `POST`
	Content-Type: `application/json`

	### Request Body (JSON)

	```json
	{
	"prompt": "string",
	"text_to_speech": "string (optional)",
	"elevenlabs_audio_url": "string (optional)",
	"voice_id": "string (optional, default: '21m00Tcm4TlvDq8ikWAM')",
	"image_url": "string (optional)",
	"guidance_scale": "float (default: 5.0)",
	"audio_scale": "float (default: 3.0)",
	"num_steps": "int (default: 30)",
	"sp_size": "int (default: 1)",
	"tea_cache_l1_thresh": "float (optional)"
	}
	```

	### Request Parameters

	\| Field \| Type \| Required \| Description \|
	\|-------\|------\|----------\|-------------\|
	\| `prompt` \| string \| ✅ \| Character behavior description \|
	\| `text_to_speech` \| string \| ❌ \| Text to convert to speech via ElevenLabs \|
	\| `elevenlabs_audio_url` \| string \| ❌ \| Direct URL to audio file \|
	\| `voice_id` \| string \| ❌ \| ElevenLabs voice ID (default: Rachel) \|
	\| `image_url` \| string \| ❌ \| Reference image URL \|
	\| `guidance_scale` \| float \| ❌ \| Prompt following strength (4-6 recommended) \|
	\| `audio_scale` \| float \| ❌ \| Lip-sync accuracy (3-5 recommended) \|
	\| `num_steps` \| int \| ❌ \| Generation steps (20-50 recommended) \|
	\| `sp_size` \| int \| ❌ \| Parallel processing size \|
	\| `tea_cache_l1_thresh` \| float \| ❌ \| Cache threshold optimization \|

	Note: Either `text_to_speech` OR `elevenlabs_audio_url` must be provided.

	### Example Request

	```json
	{
	"prompt": "A professional teacher explaining a mathematical concept with clear gestures",
	"text_to_speech": "Hello students! Today we're going to learn about calculus and how derivatives work in real life.",
	"voice_id": "21m00Tcm4TlvDq8ikWAM",
	"image_url": "https://example.com/teacher.jpg",
	"guidance_scale": 5.0,
	"audio_scale": 3.5,
	"num_steps": 30
	}
	```

	### Response Format

	Success Response (200 OK):

	```json
	{
	"message": "string",
	"output_path": "string",
	"processing_time": "float",
	"audio_generated": "boolean"
	}
	```

	### Response Fields

	\| Field \| Type \| Description \|
	\|-------\|------\|-------------\|
	\| `message` \| string \| Success/status message \|
	\| `output_path` \| string \| Path to generated video file \|
	\| `processing_time` \| float \| Processing time in seconds \|
	\| `audio_generated` \| boolean \| Whether audio was generated from text \|

	### Example Response

	```json
	{
	"message": "Avatar generation completed successfully",
	"output_path": "./outputs/avatar_20240807_130512.mp4",
	"processing_time": 45.67,
	"audio_generated": true
	}
	```

	### Error Responses

	400 Bad Request:
	```json
	{
	"detail": "Either text_to_speech or elevenlabs_audio_url must be provided"
	}
	```

	500 Internal Server Error:
	```json
	{
	"detail": "Model not loaded"
	}
	```

	503 Service Unavailable:
	```json
	{
	"detail": "Model not loaded"
	}
	```

	### Available ElevenLabs Voices

	\| Voice ID \| Name \| Description \|
	\|----------\|------\|-------------\|
	\| `21m00Tcm4TlvDq8ikWAM` \| Rachel \| Default, clear female voice \|
	\| `pNInz6obpgDQGcFmaJgB` \| Adam \| Professional male voice \|
	\| `EXAVITQu4vr4xnSDxMaL` \| Bella \| Expressive female voice \|

	### Usage Examples

	#### With Text-to-Speech
	```bash
	curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \
	-H "Content-Type: application/json" \
	-d '{
	"prompt": "A friendly presenter speaking confidently",
	"text_to_speech": "Welcome to our AI avatar demonstration!",
	"voice_id": "21m00Tcm4TlvDq8ikWAM",
	"guidance_scale": 5.5,
	"audio_scale": 4.0
	}'
	```

	#### With Audio URL
	```bash
	curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \
	-H "Content-Type: application/json" \
	-d '{
	"prompt": "A news anchor delivering headlines",
	"elevenlabs_audio_url": "https://example.com/audio.mp3",
	"image_url": "https://example.com/anchor.jpg",
	"num_steps": 40
	}'
	```

	### Other Endpoints

	#### GET /health - Health Check
	```json
	{
	"status": "healthy",
	"model_loaded": true,
	"device": "cuda",
	"supports_elevenlabs": true,
	"supports_image_urls": true,
	"supports_text_to_speech": true,
	"elevenlabs_api_configured": true
	}
	```

	#### GET /docs - FastAPI Documentation
	Interactive API documentation available at `/docs` endpoint.

	### Rate Limits & Performance

	- Processing Time: 30-120 seconds depending on complexity
	- Max Video Length: Determined by audio length
	- Supported Formats: MP4 output, MP3/WAV audio input
	- GPU Acceleration: Enabled on T4+ hardware

	---

	Live API Base URL: `https://huggingface.co/spaces/bravedims/AI_Avatar_Chat`