t2m / README.md
thanhkt's picture
Update README.md
32b5842 verified
---
title: AI Animation & Voice Studio
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
suggested_hardware: cpu-upgrade
suggested_storage: large
pinned: true
license: apache-2.0
short_description: "Create mathematical animations with AI-powered using Manim"
tags:
- text-to-speech
- animation
- mathematics
- manim
- ai-voice
- educational
- visualization
models:
- kokoro-onnx/kokoro-v0_19
datasets: []
startup_duration_timeout: 30m
fullWidth: true
header: default
disable_embedding: false
preload_from_hub: []
---
# AI Animation & Voice Studio 🎬
A powerful application that combines AI-powered text-to-speech with mathematical animation generation using Manim and Kokoro TTS. Create stunning educational content with synchronized voice narration and mathematical visualizations.
## πŸš€ Features
- **Text-to-Speech**: High-quality voice synthesis using Kokoro ONNX models
- **Mathematical Animations**: Create stunning mathematical visualizations with Manim
- **LaTeX Support**: Full LaTeX rendering capabilities with TinyTeX
- **Interactive Interface**: User-friendly Gradio web interface
- **Audio Processing**: Advanced audio manipulation with FFmpeg and SoX
## πŸ› οΈ Technology Stack
- **Frontend**: Gradio for interactive web interface
- **Backend**: Python with FastAPI/Flask
- **Animation**: Manim (Mathematical Animation Engine)
- **TTS**: Kokoro ONNX for text-to-speech synthesis
- **LaTeX**: TinyTeX for mathematical typesetting
- **Audio**: FFmpeg, SoX, PortAudio for audio processing
- **Deployment**: Docker container optimized for Hugging Face Spaces
## πŸ“¦ Models
This application uses the following pre-trained models:
- **Kokoro TTS**: `kokoro-v0_19.onnx` - High-quality neural text-to-speech model
- **Voice Models**: `voices.bin` - Voice embedding models for different speaker characteristics
Models are automatically downloaded during the Docker build process from the official releases.
## πŸƒβ€β™‚οΈ Quick Start
### Using Hugging Face Spaces
1. Visit the [Space](https://huggingface.co/spaces/your-username/ai-animation-voice-studio)
2. Wait for the container to load (initial startup may take 3-5 minutes due to model loading)
3. Upload your script or enter text directly
4. Choose animation settings and voice parameters
5. Generate your animated video with AI narration!
### Local Development
```bash
# Clone the repository
git clone https://huggingface.co/spaces/your-username/ai-animation-voice-studio
cd ai-animation-voice-studio
# Build the Docker image
docker build -t ai-animation-studio .
# Run the container
docker run -p 7860:7860 ai-animation-studio
```
Access the application at `http://localhost:7860`
### Environment Setup
Create a `.env` file with your configuration:
```env
# Application settings
DEBUG=false
MAX_WORKERS=4
# Model settings
MODEL_PATH=/app/models
CACHE_DIR=/tmp/cache
# Optional: API keys if needed
# OPENAI_API_KEY=your_key_here
```
## 🎯 Usage Examples
### Basic Text-to-Speech
```python
# Example usage in your code
from src.tts import generate_speech
audio = generate_speech(
text="Hello, this is a test of the text-to-speech system",
voice="default",
speed=1.0
)
```
### Mathematical Animation
```python
# Example Manim scene
from manim import *
class Example(Scene):
def construct(self):
# Your animation code here
pass
```
## πŸ“ Project Structure
```
β”œβ”€β”€ src/ # Source code
β”‚ β”œβ”€β”€ tts/ # Text-to-speech modules
β”‚ β”œβ”€β”€ manim_scenes/ # Manim animation scenes
β”‚ └── utils/ # Utility functions
β”œβ”€β”€ models/ # Pre-trained models (auto-downloaded)
β”œβ”€β”€ output/ # Generated content output
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ Dockerfile # Container configuration
β”œβ”€β”€ gradio_app.py # Main application entry point
└── README.md # This file
```
## βš™οΈ Configuration
### Docker Environment Variables
- `GRADIO_SERVER_NAME`: Server host (default: 0.0.0.0)
- `GRADIO_SERVER_PORT`: Server port (default: 7860)
- `PYTHONPATH`: Python path configuration
- `HF_HOME`: Hugging Face cache directory
### Application Settings
Modify settings in your `.env` file or through environment variables:
- Model parameters
- Audio quality settings
- Animation render settings
- Cache configurations
## πŸ”§ Development
### Prerequisites
- Docker and Docker Compose
- Python 3.12+
- Git
### Setting Up Development Environment
```bash
# Install dependencies locally for development
pip install -r requirements.txt
# Run tests (if available)
python -m pytest tests/
# Format code
black .
isort .
# Lint code
flake8 .
```
### Building and Testing
```bash
# Build the Docker image
docker build -t your-app-name:dev .
# Test the container locally
docker run --rm -p 7860:7860 your-app-name:dev
# Check container health
docker run --rm your-app-name:dev python -c "import src; print('Import successful')"
```
## πŸ“Š Performance & Hardware
### Recommended Specs for Hugging Face Spaces
- **Hardware**: `cpu-upgrade` (recommended for faster rendering)
- **Storage**: `small` (sufficient for models and temporary files)
- **Startup Time**: ~3-5 minutes (due to model loading and TinyTeX setup)
- **Memory Usage**: ~2-3GB during operation
### System Requirements
- **Memory**: Minimum 2GB RAM, Recommended 4GB+
- **CPU**: Multi-core processor recommended for faster animation rendering
- **Storage**: ~1.5GB for models and dependencies
- **Network**: Stable connection for initial model downloads
### Optimization Tips
- Models are cached after first download
- Gradio interface uses efficient streaming for large outputs
- Docker multi-stage builds minimize final image size
- TinyTeX installation is optimized for essential packages only
## πŸ› Troubleshooting
### Common Issues
**Build Failures**:
```bash
# Clear Docker cache if build fails
docker system prune -a
docker build --no-cache -t your-app-name .
```
**Model Download Issues**:
- Check internet connection
- Verify model URLs are accessible
- Models will be re-downloaded if corrupted
**Memory Issues**:
- Reduce batch sizes in configuration
- Monitor memory usage with `docker stats`
**Audio Issues**:
- Ensure audio drivers are properly installed
- Check PortAudio configuration
### Getting Help
1. Check the [Discussions](https://huggingface.co/spaces/your-username/ai-animation-voice-studio/discussions) tab
2. Review container logs in the Space settings
3. Enable debug mode in configuration
4. Report issues in the Community tab
### Common Configuration Issues
**Space Configuration**:
- Ensure `app_port: 7860` is set in README.md front matter
- Check that `sdk: docker` is properly configured
- Verify hardware suggestions match your needs
**Model Loading**:
- Models download automatically on first run
- Check Space logs for download progress
- Restart Space if models fail to load
## 🀝 Contributing
We welcome contributions! Please see our contributing guidelines:
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
### Code Style
- Follow PEP 8 for Python code
- Use Black for code formatting
- Add docstrings for functions and classes
- Include type hints where appropriate
## πŸ“„ License
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
## πŸ™ Acknowledgments
- [Manim Community](https://www.manim.community/) for the animation engine
- [Kokoro TTS](https://github.com/thewh1teagle/kokoro-onnx) for text-to-speech models
- [Gradio](https://gradio.app/) for the web interface framework
- [Hugging Face](https://huggingface.co/) for hosting and infrastructure
## πŸ“ž Contact
- **Author**: Your Name
- **Email**: [email protected]
- **GitHub**: [@your-username](https://github.com/your-username)
- **Hugging Face**: [@your-username](https://huggingface.co/your-username)
---
*Built with ❀️ for the open-source community*