Spaces:

thanhkt
/

t2m

Running

App Files Files Community

t2m / README.md

thanhkt

Update README.md

32b5842 verified 8 days ago

preview code

raw

history blame contribute delete

8.12 kB

	---
	title: AI Animation & Voice Studio
	emoji: 🎬
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	suggested_hardware: cpu-upgrade
	suggested_storage: large
	pinned: true
	license: apache-2.0
	short_description: "Create mathematical animations with AI-powered using Manim"
	tags:
	- text-to-speech
	- animation
	- mathematics
	- manim
	- ai-voice
	- educational
	- visualization
	models:
	- kokoro-onnx/kokoro-v0_19
	datasets: []
	startup_duration_timeout: 30m
	fullWidth: true
	header: default
	disable_embedding: false
	preload_from_hub: []
	---

	# AI Animation & Voice Studio 🎬

	A powerful application that combines AI-powered text-to-speech with mathematical animation generation using Manim and Kokoro TTS. Create stunning educational content with synchronized voice narration and mathematical visualizations.

	## 🚀 Features

	- Text-to-Speech: High-quality voice synthesis using Kokoro ONNX models
	- Mathematical Animations: Create stunning mathematical visualizations with Manim
	- LaTeX Support: Full LaTeX rendering capabilities with TinyTeX
	- Interactive Interface: User-friendly Gradio web interface
	- Audio Processing: Advanced audio manipulation with FFmpeg and SoX

	## 🛠️ Technology Stack

	- Frontend: Gradio for interactive web interface
	- Backend: Python with FastAPI/Flask
	- Animation: Manim (Mathematical Animation Engine)
	- TTS: Kokoro ONNX for text-to-speech synthesis
	- LaTeX: TinyTeX for mathematical typesetting
	- Audio: FFmpeg, SoX, PortAudio for audio processing
	- Deployment: Docker container optimized for Hugging Face Spaces

	## 📦 Models

	This application uses the following pre-trained models:

	- Kokoro TTS: `kokoro-v0_19.onnx` - High-quality neural text-to-speech model
	- Voice Models: `voices.bin` - Voice embedding models for different speaker characteristics

	Models are automatically downloaded during the Docker build process from the official releases.

	## 🏃‍♂️ Quick Start

	### Using Hugging Face Spaces

	1. Visit the [Space](https://huggingface.co/spaces/your-username/ai-animation-voice-studio)
	2. Wait for the container to load (initial startup may take 3-5 minutes due to model loading)
	3. Upload your script or enter text directly
	4. Choose animation settings and voice parameters
	5. Generate your animated video with AI narration!

	### Local Development

	```bash
	# Clone the repository
	git clone https://huggingface.co/spaces/your-username/ai-animation-voice-studio
	cd ai-animation-voice-studio

	# Build the Docker image
	docker build -t ai-animation-studio .

	# Run the container
	docker run -p 7860:7860 ai-animation-studio
	```

	Access the application at `http://localhost:7860`

	### Environment Setup

	Create a `.env` file with your configuration:

	```env
	# Application settings
	DEBUG=false
	MAX_WORKERS=4

	# Model settings
	MODEL_PATH=/app/models
	CACHE_DIR=/tmp/cache

	# Optional: API keys if needed
	# OPENAI_API_KEY=your_key_here
	```

	## 🎯 Usage Examples

	### Basic Text-to-Speech

	```python
	# Example usage in your code
	from src.tts import generate_speech

	audio = generate_speech(
	text="Hello, this is a test of the text-to-speech system",
	voice="default",
	speed=1.0
	)
	```

	### Mathematical Animation

	```python
	# Example Manim scene
	from manim import *

	class Example(Scene):
	def construct(self):
	# Your animation code here
	pass
	```

	## 📁 Project Structure

	```
	├── src/ # Source code
	│ ├── tts/ # Text-to-speech modules
	│ ├── manim_scenes/ # Manim animation scenes
	│ └── utils/ # Utility functions
	├── models/ # Pre-trained models (auto-downloaded)
	├── output/ # Generated content output
	├── requirements.txt # Python dependencies
	├── Dockerfile # Container configuration
	├── gradio_app.py # Main application entry point
	└── README.md # This file
	```

	## ⚙️ Configuration

	### Docker Environment Variables

	- `GRADIO_SERVER_NAME`: Server host (default: 0.0.0.0)
	- `GRADIO_SERVER_PORT`: Server port (default: 7860)
	- `PYTHONPATH`: Python path configuration
	- `HF_HOME`: Hugging Face cache directory

	### Application Settings

	Modify settings in your `.env` file or through environment variables:

	- Model parameters
	- Audio quality settings
	- Animation render settings
	- Cache configurations

	## 🔧 Development

	### Prerequisites

	- Docker and Docker Compose
	- Python 3.12+
	- Git

	### Setting Up Development Environment

	```bash
	# Install dependencies locally for development
	pip install -r requirements.txt

	# Run tests (if available)
	python -m pytest tests/

	# Format code
	black .
	isort .

	# Lint code
	flake8 .
	```

	### Building and Testing

	```bash
	# Build the Docker image
	docker build -t your-app-name:dev .

	# Test the container locally
	docker run --rm -p 7860:7860 your-app-name:dev

	# Check container health
	docker run --rm your-app-name:dev python -c "import src; print('Import successful')"
	```

	## 📊 Performance & Hardware

	### Recommended Specs for Hugging Face Spaces

	- Hardware: `cpu-upgrade` (recommended for faster rendering)
	- Storage: `small` (sufficient for models and temporary files)
	- Startup Time: ~3-5 minutes (due to model loading and TinyTeX setup)
	- Memory Usage: ~2-3GB during operation

	### System Requirements

	- Memory: Minimum 2GB RAM, Recommended 4GB+
	- CPU: Multi-core processor recommended for faster animation rendering
	- Storage: ~1.5GB for models and dependencies
	- Network: Stable connection for initial model downloads

	### Optimization Tips

	- Models are cached after first download
	- Gradio interface uses efficient streaming for large outputs
	- Docker multi-stage builds minimize final image size
	- TinyTeX installation is optimized for essential packages only

	## 🐛 Troubleshooting

	### Common Issues

	Build Failures:
	```bash
	# Clear Docker cache if build fails
	docker system prune -a
	docker build --no-cache -t your-app-name .
	```

	Model Download Issues:
	- Check internet connection
	- Verify model URLs are accessible
	- Models will be re-downloaded if corrupted

	Memory Issues:
	- Reduce batch sizes in configuration
	- Monitor memory usage with `docker stats`

	Audio Issues:
	- Ensure audio drivers are properly installed
	- Check PortAudio configuration

	### Getting Help

	1. Check the [Discussions](https://huggingface.co/spaces/your-username/ai-animation-voice-studio/discussions) tab
	2. Review container logs in the Space settings
	3. Enable debug mode in configuration
	4. Report issues in the Community tab

	### Common Configuration Issues

	Space Configuration:
	- Ensure `app_port: 7860` is set in README.md front matter
	- Check that `sdk: docker` is properly configured
	- Verify hardware suggestions match your needs

	Model Loading:
	- Models download automatically on first run
	- Check Space logs for download progress
	- Restart Space if models fail to load

	## 🤝 Contributing

	We welcome contributions! Please see our contributing guidelines:

	1. Fork the repository
	2. Create a feature branch
	3. Make your changes
	4. Add tests if applicable
	5. Submit a pull request

	### Code Style

	- Follow PEP 8 for Python code
	- Use Black for code formatting
	- Add docstrings for functions and classes
	- Include type hints where appropriate

	## 📄 License

	This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

	## 🙏 Acknowledgments

	- [Manim Community](https://www.manim.community/) for the animation engine
	- [Kokoro TTS](https://github.com/thewh1teagle/kokoro-onnx) for text-to-speech models
	- [Gradio](https://gradio.app/) for the web interface framework
	- [Hugging Face](https://huggingface.co/) for hosting and infrastructure

	## 📞 Contact

	- Author: Your Name
	- Email: [email protected]
	- GitHub: [@your-username](https://github.com/your-username)
	- Hugging Face: [@your-username](https://huggingface.co/your-username)

	---

	Built with ❤️ for the open-source community