|
--- |
|
title: AI Animation & Voice Studio |
|
emoji: π¬ |
|
colorFrom: blue |
|
colorTo: purple |
|
sdk: docker |
|
app_port: 7860 |
|
suggested_hardware: cpu-upgrade |
|
suggested_storage: large |
|
pinned: true |
|
license: apache-2.0 |
|
short_description: "Create mathematical animations with AI-powered using Manim" |
|
tags: |
|
- text-to-speech |
|
- animation |
|
- mathematics |
|
- manim |
|
- ai-voice |
|
- educational |
|
- visualization |
|
models: |
|
- kokoro-onnx/kokoro-v0_19 |
|
datasets: [] |
|
startup_duration_timeout: 30m |
|
fullWidth: true |
|
header: default |
|
disable_embedding: false |
|
preload_from_hub: [] |
|
--- |
|
|
|
# AI Animation & Voice Studio π¬ |
|
|
|
A powerful application that combines AI-powered text-to-speech with mathematical animation generation using Manim and Kokoro TTS. Create stunning educational content with synchronized voice narration and mathematical visualizations. |
|
|
|
## π Features |
|
|
|
- **Text-to-Speech**: High-quality voice synthesis using Kokoro ONNX models |
|
- **Mathematical Animations**: Create stunning mathematical visualizations with Manim |
|
- **LaTeX Support**: Full LaTeX rendering capabilities with TinyTeX |
|
- **Interactive Interface**: User-friendly Gradio web interface |
|
- **Audio Processing**: Advanced audio manipulation with FFmpeg and SoX |
|
|
|
## π οΈ Technology Stack |
|
|
|
- **Frontend**: Gradio for interactive web interface |
|
- **Backend**: Python with FastAPI/Flask |
|
- **Animation**: Manim (Mathematical Animation Engine) |
|
- **TTS**: Kokoro ONNX for text-to-speech synthesis |
|
- **LaTeX**: TinyTeX for mathematical typesetting |
|
- **Audio**: FFmpeg, SoX, PortAudio for audio processing |
|
- **Deployment**: Docker container optimized for Hugging Face Spaces |
|
|
|
## π¦ Models |
|
|
|
This application uses the following pre-trained models: |
|
|
|
- **Kokoro TTS**: `kokoro-v0_19.onnx` - High-quality neural text-to-speech model |
|
- **Voice Models**: `voices.bin` - Voice embedding models for different speaker characteristics |
|
|
|
Models are automatically downloaded during the Docker build process from the official releases. |
|
|
|
## πββοΈ Quick Start |
|
|
|
### Using Hugging Face Spaces |
|
|
|
1. Visit the [Space](https://huggingface.co/spaces/your-username/ai-animation-voice-studio) |
|
2. Wait for the container to load (initial startup may take 3-5 minutes due to model loading) |
|
3. Upload your script or enter text directly |
|
4. Choose animation settings and voice parameters |
|
5. Generate your animated video with AI narration! |
|
|
|
### Local Development |
|
|
|
```bash |
|
# Clone the repository |
|
git clone https://huggingface.co/spaces/your-username/ai-animation-voice-studio |
|
cd ai-animation-voice-studio |
|
|
|
# Build the Docker image |
|
docker build -t ai-animation-studio . |
|
|
|
# Run the container |
|
docker run -p 7860:7860 ai-animation-studio |
|
``` |
|
|
|
Access the application at `http://localhost:7860` |
|
|
|
### Environment Setup |
|
|
|
Create a `.env` file with your configuration: |
|
|
|
```env |
|
# Application settings |
|
DEBUG=false |
|
MAX_WORKERS=4 |
|
|
|
# Model settings |
|
MODEL_PATH=/app/models |
|
CACHE_DIR=/tmp/cache |
|
|
|
# Optional: API keys if needed |
|
# OPENAI_API_KEY=your_key_here |
|
``` |
|
|
|
## π― Usage Examples |
|
|
|
### Basic Text-to-Speech |
|
|
|
```python |
|
# Example usage in your code |
|
from src.tts import generate_speech |
|
|
|
audio = generate_speech( |
|
text="Hello, this is a test of the text-to-speech system", |
|
voice="default", |
|
speed=1.0 |
|
) |
|
``` |
|
|
|
### Mathematical Animation |
|
|
|
```python |
|
# Example Manim scene |
|
from manim import * |
|
|
|
class Example(Scene): |
|
def construct(self): |
|
# Your animation code here |
|
pass |
|
``` |
|
|
|
## π Project Structure |
|
|
|
``` |
|
βββ src/ # Source code |
|
β βββ tts/ # Text-to-speech modules |
|
β βββ manim_scenes/ # Manim animation scenes |
|
β βββ utils/ # Utility functions |
|
βββ models/ # Pre-trained models (auto-downloaded) |
|
βββ output/ # Generated content output |
|
βββ requirements.txt # Python dependencies |
|
βββ Dockerfile # Container configuration |
|
βββ gradio_app.py # Main application entry point |
|
βββ README.md # This file |
|
``` |
|
|
|
## βοΈ Configuration |
|
|
|
### Docker Environment Variables |
|
|
|
- `GRADIO_SERVER_NAME`: Server host (default: 0.0.0.0) |
|
- `GRADIO_SERVER_PORT`: Server port (default: 7860) |
|
- `PYTHONPATH`: Python path configuration |
|
- `HF_HOME`: Hugging Face cache directory |
|
|
|
### Application Settings |
|
|
|
Modify settings in your `.env` file or through environment variables: |
|
|
|
- Model parameters |
|
- Audio quality settings |
|
- Animation render settings |
|
- Cache configurations |
|
|
|
## π§ Development |
|
|
|
### Prerequisites |
|
|
|
- Docker and Docker Compose |
|
- Python 3.12+ |
|
- Git |
|
|
|
### Setting Up Development Environment |
|
|
|
```bash |
|
# Install dependencies locally for development |
|
pip install -r requirements.txt |
|
|
|
# Run tests (if available) |
|
python -m pytest tests/ |
|
|
|
# Format code |
|
black . |
|
isort . |
|
|
|
# Lint code |
|
flake8 . |
|
``` |
|
|
|
### Building and Testing |
|
|
|
```bash |
|
# Build the Docker image |
|
docker build -t your-app-name:dev . |
|
|
|
# Test the container locally |
|
docker run --rm -p 7860:7860 your-app-name:dev |
|
|
|
# Check container health |
|
docker run --rm your-app-name:dev python -c "import src; print('Import successful')" |
|
``` |
|
|
|
## π Performance & Hardware |
|
|
|
### Recommended Specs for Hugging Face Spaces |
|
|
|
- **Hardware**: `cpu-upgrade` (recommended for faster rendering) |
|
- **Storage**: `small` (sufficient for models and temporary files) |
|
- **Startup Time**: ~3-5 minutes (due to model loading and TinyTeX setup) |
|
- **Memory Usage**: ~2-3GB during operation |
|
|
|
### System Requirements |
|
|
|
- **Memory**: Minimum 2GB RAM, Recommended 4GB+ |
|
- **CPU**: Multi-core processor recommended for faster animation rendering |
|
- **Storage**: ~1.5GB for models and dependencies |
|
- **Network**: Stable connection for initial model downloads |
|
|
|
### Optimization Tips |
|
|
|
- Models are cached after first download |
|
- Gradio interface uses efficient streaming for large outputs |
|
- Docker multi-stage builds minimize final image size |
|
- TinyTeX installation is optimized for essential packages only |
|
|
|
## π Troubleshooting |
|
|
|
### Common Issues |
|
|
|
**Build Failures**: |
|
```bash |
|
# Clear Docker cache if build fails |
|
docker system prune -a |
|
docker build --no-cache -t your-app-name . |
|
``` |
|
|
|
**Model Download Issues**: |
|
- Check internet connection |
|
- Verify model URLs are accessible |
|
- Models will be re-downloaded if corrupted |
|
|
|
**Memory Issues**: |
|
- Reduce batch sizes in configuration |
|
- Monitor memory usage with `docker stats` |
|
|
|
**Audio Issues**: |
|
- Ensure audio drivers are properly installed |
|
- Check PortAudio configuration |
|
|
|
### Getting Help |
|
|
|
1. Check the [Discussions](https://huggingface.co/spaces/your-username/ai-animation-voice-studio/discussions) tab |
|
2. Review container logs in the Space settings |
|
3. Enable debug mode in configuration |
|
4. Report issues in the Community tab |
|
|
|
### Common Configuration Issues |
|
|
|
**Space Configuration**: |
|
- Ensure `app_port: 7860` is set in README.md front matter |
|
- Check that `sdk: docker` is properly configured |
|
- Verify hardware suggestions match your needs |
|
|
|
**Model Loading**: |
|
- Models download automatically on first run |
|
- Check Space logs for download progress |
|
- Restart Space if models fail to load |
|
|
|
## π€ Contributing |
|
|
|
We welcome contributions! Please see our contributing guidelines: |
|
|
|
1. Fork the repository |
|
2. Create a feature branch |
|
3. Make your changes |
|
4. Add tests if applicable |
|
5. Submit a pull request |
|
|
|
### Code Style |
|
|
|
- Follow PEP 8 for Python code |
|
- Use Black for code formatting |
|
- Add docstrings for functions and classes |
|
- Include type hints where appropriate |
|
|
|
## π License |
|
|
|
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details. |
|
|
|
## π Acknowledgments |
|
|
|
- [Manim Community](https://www.manim.community/) for the animation engine |
|
- [Kokoro TTS](https://github.com/thewh1teagle/kokoro-onnx) for text-to-speech models |
|
- [Gradio](https://gradio.app/) for the web interface framework |
|
- [Hugging Face](https://huggingface.co/) for hosting and infrastructure |
|
|
|
## π Contact |
|
|
|
- **Author**: Your Name |
|
- **Email**: [email protected] |
|
- **GitHub**: [@your-username](https://github.com/your-username) |
|
- **Hugging Face**: [@your-username](https://huggingface.co/your-username) |
|
|
|
--- |
|
|
|
*Built with β€οΈ for the open-source community* |