Spaces:

bravedims
/

AI_Avatar_Chat

Running

App Files Files Community

AI_Avatar_Chat / MODEL_DOWNLOAD_GUIDE.md

bravedims

📋 Add model download guides and helpers for TTS-only mode issue

c89ce9a 26 days ago

preview code

raw

history blame contribute delete

2.39 kB

	# Alternative OmniAvatar Model Download Guide

	## 🎯 Why You're Getting Only Audio Output

	Your app is working correctly but running in TTS-only mode because the OmniAvatar-14B models are missing. The app gracefully falls back to audio-only generation when video models aren't available.

	## 🚀 Solutions to Enable Video Generation

	### Option 1: Use Git to Download Models (If you have Git LFS)

	# Create model directories
	mkdir pretrained_models\Wan2.1-T2V-14B
	mkdir pretrained_models\OmniAvatar-14B
	mkdir pretrained_models\wav2vec2-base-960h

	# Clone models (requires Git LFS)
	git lfs clone https://huggingface.co/Wan-AI/Wan2.1-T2V-14B pretrained_models/Wan2.1-T2V-14B
	git lfs clone https://huggingface.co/OmniAvatar/OmniAvatar-14B pretrained_models/OmniAvatar-14B
	git lfs clone https://huggingface.co/facebook/wav2vec2-base-960h pretrained_models/wav2vec2-base-960h

	### Option 2: Install Python and Run Setup Script

	1. Install Python (if not already done):
	- Download from: https://python.org/downloads/
	- Or enable from Microsoft Store
	- Make sure to check "Add to PATH" during installation

	2. Run the setup script:
	python setup_omniavatar.py

	### Option 3: Manual Download from HuggingFace

	Visit these URLs and download manually:
	- https://huggingface.co/Wan-AI/Wan2.1-T2V-14B
	- https://huggingface.co/OmniAvatar/OmniAvatar-14B
	- https://huggingface.co/facebook/wav2vec2-base-960h

	Extract to:
	- pretrained_models/Wan2.1-T2V-14B/
	- pretrained_models/OmniAvatar-14B/
	- pretrained_models/wav2vec2-base-960h/

	### Option 4: Use Windows Subsystem for Linux (WSL)

	If you have WSL installed:
	```bash
	wsl
	cd /mnt/c/path/to/your/project
	python setup_omniavatar.py
	```

	## 📊 Model Requirements

	Total download size: ~30.36GB
	- Wan2.1-T2V-14B: ~28GB (base text-to-video model)
	- OmniAvatar-14B: ~2GB (avatar animation weights)
	- wav2vec2-base-960h: ~360MB (audio encoder)

	## 🔍 Verify Installation

	After downloading, restart your app and check:
	- The app should show "full functionality enabled" in logs
	- API responses should return video URLs instead of just audio
	- Gradio interface should show video output component

	## 💡 Current Status

	Your setup is working perfectly for TTS! Once the OmniAvatar models are downloaded, you'll get:
	✅ Audio-driven avatar videos
	✅ Adaptive body animation
	✅ Lip-sync accuracy
	✅ 480p video output