thanhkt commited on
Commit
3aba555
Β·
verified Β·
1 Parent(s): 6ec19f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +245 -287
README.md CHANGED
@@ -1,348 +1,306 @@
1
- # TheoremExplainAgent (TEA) 🍡
2
- [![arXiv](https://img.shields.io/badge/arXiv-2502.19400-b31b1b.svg)](https://arxiv.org/abs/2502.19400)
3
- <a href='https://huggingface.co/papers/2502.19400'><img src='https://img.shields.io/static/v1?label=Paper&message=Huggingface&color=orange'></a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
- [**🌐 Homepage**](https://tiger-ai-lab.github.io/TheoremExplainAgent/) | [**πŸ“– arXiv**](https://arxiv.org/abs/2502.19400) | [**πŸ€— HuggingFace Dataset**](https://huggingface.co/datasets/TIGER-Lab/TheoremExplainBench)
6
 
7
- [![contributors](https://img.shields.io/github/contributors/TIGER-AI-Lab/TheoremExplainAgent)](https://github.com/TIGER-AI-Lab/TheoremExplainAgent/graphs/contributors)
8
- [![license](https://img.shields.io/github/license/TIGER-AI-Lab/TheoremExplainAgent.svg)](https://github.com/TIGER-AI-Lab/TheoremExplainAgent/blob/main/LICENSE)
9
- [![GitHub](https://img.shields.io/github/stars/TIGER-AI-Lab/TheoremExplainAgent?style=social)](https://github.com/TIGER-AI-Lab/TheoremExplainAgent)
10
- [![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FTIGER-AI-Lab%2FTheoremExplainAgent&count_bg=%23C83DB9&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=visitors&edge_flat=false)](https://hits.seeyoufarm.com)
11
 
12
- This repo contains the codebase for our paper [TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding](https://arxiv.org/abs/2502.19400)
13
 
14
- ## Introduction
15
- TheoremExplainAgent is an AI system that generates long-form Manim videos to visually explain theorems, proving its deep understanding while uncovering reasoning flaws that text alone often hides.
 
 
16
 
 
 
 
17
 
 
 
 
18
 
19
- https://github.com/user-attachments/assets/17f2f4f2-8f2c-4abc-b377-ac92ebda69f3
20
 
 
21
 
22
- ## πŸ“° News
23
- * 2025 Mar 3: Generation code and Evaluation code released. Thanks for the wait!
24
- <!--* 2025 Mar 3: Reach 404 stars without code.-->
25
- * 2025 Feb 27: Paper available on [Arxiv](https://arxiv.org/abs/2502.19400). Thanks AK for putting our paper on [HF Daily](https://huggingface.co/papers/2502.19400).
 
 
 
 
 
 
26
 
27
- ## Installation
28
 
29
- > **Look at the [FAQ section in this README doc](https://github.com/TIGER-AI-Lab/TheoremExplainAgent?tab=readme-ov-file#-faq) if you encountered any errors. If that didnt help, create a issue**<br>
 
 
30
 
31
- 1. Setting up conda environment
32
- ```shell
33
- conda create --name tea python=3.12.8
34
- conda activate tea
35
- pip install -r requirements.txt
36
  ```
37
 
38
- 2. You may also need to install latex and other dependencies for Manim Community. Look at [Manim Installation Docs](https://docs.manim.community/en/stable/installation.html) for more details.
39
- ```shell
40
- # You might need these dependencies if you are using Linux Ubuntu:
41
- sudo apt-get install portaudio19-dev
42
- sudo apt-get install libsdl-pango-dev
 
 
 
 
 
 
 
 
43
  ```
44
 
45
- 3. Then Download the Kokoro model and voices using the commands to enable TTS service.
46
 
47
- ```shell
48
- mkdir -p models && wget -P models https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx && wget -P models https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.bin
49
- ```
50
 
51
- 4. Create `.env` based on `.env.template`, filling in the environmental variables according to the models you choose to use.
52
- See [LiteLLM](https://docs.litellm.ai/docs/providers) for reference.
 
 
53
 
54
- ```shell
55
- touch .env
56
- ```
57
- Then open the `.env` file and edit it with whatever text editor you like.
58
-
59
- Your `.env` file should look like the following:
60
- ```shell
61
- # OpenAI
62
- OPENAI_API_KEY=""
63
-
64
- # Azure OpenAI
65
- AZURE_API_KEY=""
66
- AZURE_API_BASE=""
67
- AZURE_API_VERSION=""
68
-
69
- # Google Vertex AI
70
- VERTEXAI_PROJECT=""
71
- VERTEXAI_LOCATION=""
72
- GOOGLE_APPLICATION_CREDENTIALS=""
73
-
74
- # Google Gemini
75
- GEMINI_API_KEY=""
76
-
77
- ...
78
-
79
- # Kokoro TTS Settings
80
- KOKORO_MODEL_PATH="models/kokoro-v0_19.onnx"
81
- KOKORO_VOICES_PATH="models/voices.bin"
82
- KOKORO_DEFAULT_VOICE="af"
83
- KOKORO_DEFAULT_SPEED="1.0"
84
- KOKORO_DEFAULT_LANG="en-us"
85
- ```
86
- Fill in the API keys according to the model you wanted to use.
87
 
88
- 5. Configure Python path. Note that you need to configure the python path to make it work. Otherwise you may encounter import issues (like not being able to import src etc.)
89
- ```shell
90
- export PYTHONPATH=$(pwd):$PYTHONPATH
91
- ```
92
 
93
- 6. (Optional) To setup RAG, See [https://github.com/TIGER-AI-Lab/TheoremExplainAgent?tab=readme-ov-file#generation-with-rag](https://github.com/TIGER-AI-Lab/TheoremExplainAgent?tab=readme-ov-file#generation-with-rag).
 
 
 
94
 
95
- > **Look at the [FAQ section in this README doc](https://github.com/TIGER-AI-Lab/TheoremExplainAgent?tab=readme-ov-file#-faq) if you encountered any errors. If that didnt help, create a issue**<br>
96
 
97
- ## Generation
98
 
99
- ### Supported Models
100
- <!--You can customize the allowed models by editing the `src/utils/allowed_models.json` file. This file specifies which `model` and `helper_model` the system is permitted to use.-->
101
- The model naming follows the LiteLLM convention. For details on how models should be named, please refer to the [LiteLLM documentation](https://docs.litellm.ai/docs/providers).
102
 
103
- ### Generation (Single topic)
104
- ```shell
105
- python generate_video.py \
106
- --model "openai/o3-mini" \
107
- --helper_model "openai/o3-mini" \
108
- --output_dir "output/your_exp_name" \
109
- --topic "your_topic" \
110
- --context "description of your topic, e.g. 'This is a topic about the properties of a triangle'" \
111
- ```
112
 
113
- Example:
114
- ```shell
115
- python generate_video.py \
116
- --model "openai/o3-mini" \
117
- --helper_model "openai/o3-mini" \
118
- --output_dir "output/my_exp_name" \
119
- --topic "Big O notation" \
120
- --context "most common type of asymptotic notation in computer science used to measure worst case complexity" \
121
- ```
122
 
123
- ### Generation (in batch)
124
- ```shell
125
- python generate_video.py \
126
- --model "openai/o3-mini" \
127
- --helper_model "openai/o3-mini" \
128
- --output_dir "output/my_exp_name" \
129
- --theorems_path data/thb_easy/math.json \
130
- --max_scene_concurrency 7 \
131
- --max_topic_concurrency 20 \
132
- ```
133
 
134
- ### Generation with RAG
135
- Before using RAG, download the RAG documentation from this [Google Drive link](https://drive.google.com/file/d/1Tn6J_JKVefFZRgZbjns93KLBtI9ullRv/view?usp=sharing). After downloading, unzip the file. For example, if you unzip it to `data/rag/manim_docs`, then you should set `--manim_docs_path` to `data/rag/manim_docs`. The vector database will be created the first time you run with RAG.
136
-
137
- ```shell
138
- python generate_video.py \
139
- --model "openai/o3-mini" \
140
- --helper_model "openai/o3-mini" \
141
- --output_dir "output/with_rag/o3-mini/vtutorbench_easy/math" \
142
- --topic "Big O notation" \
143
- --context "most common type of asymptotic notation in computer science used to measure worst case complexity" \
144
- --use_rag \
145
- --chroma_db_path "data/rag/chroma_db" \
146
- --manim_docs_path "data/rag/manim_docs" \
147
- --embedding_model "vertex_ai/text-embedding-005"
148
- ```
149
 
150
- We support more options for generation, see below for more details:
151
- ```shell
152
- usage: generate_video.py [-h]
153
- [--model]
154
- [--topic TOPIC] [--context CONTEXT]
155
- [--helper_model]
156
- [--only_gen_vid] [--only_combine] [--peek_existing_videos] [--output_dir OUTPUT_DIR] [--theorems_path THEOREMS_PATH]
157
- [--sample_size SAMPLE_SIZE] [--verbose] [--max_retries MAX_RETRIES] [--use_rag] [--use_visual_fix_code]
158
- [--chroma_db_path CHROMA_DB_PATH] [--manim_docs_path MANIM_DOCS_PATH]
159
- [--embedding_model {azure/text-embedding-3-large,vertex_ai/text-embedding-005}] [--use_context_learning]
160
- [--context_learning_path CONTEXT_LEARNING_PATH] [--use_langfuse] [--max_scene_concurrency MAX_SCENE_CONCURRENCY]
161
- [--max_topic_concurrency MAX_TOPIC_CONCURRENCY] [--debug_combine_topic DEBUG_COMBINE_TOPIC] [--only_plan] [--check_status]
162
- [--only_render] [--scenes SCENES [SCENES ...]]
163
-
164
- Generate Manim videos using AI
165
-
166
- options:
167
- -h, --help show this help message and exit
168
- --model Select the AI model to use
169
- --topic TOPIC Topic to generate videos for
170
- --context CONTEXT Context of the topic
171
- --helper_model Select the helper model to use
172
- --only_gen_vid Only generate videos to existing plans
173
- --only_combine Only combine videos
174
- --peek_existing_videos, --peek
175
- Peek at existing videos
176
- --output_dir OUTPUT_DIR
177
- Output directory
178
- --theorems_path THEOREMS_PATH
179
- Path to theorems json file
180
- --sample_size SAMPLE_SIZE, --sample SAMPLE_SIZE
181
- Number of theorems to sample
182
- --verbose Print verbose output
183
- --max_retries MAX_RETRIES
184
- Maximum number of retries for code generation
185
- --use_rag, --rag Use Retrieval Augmented Generation
186
- --use_visual_fix_code, --visual_fix_code
187
- Use VLM to fix code with rendered visuals
188
- --chroma_db_path CHROMA_DB_PATH
189
- Path to Chroma DB
190
- --manim_docs_path MANIM_DOCS_PATH
191
- Path to manim docs
192
- --embedding_model {azure/text-embedding-3-large,vertex_ai/text-embedding-005}
193
- Select the embedding model to use
194
- --use_context_learning
195
- Use context learning with example Manim code
196
- --context_learning_path CONTEXT_LEARNING_PATH
197
- Path to context learning examples
198
- --use_langfuse Enable Langfuse logging
199
- --max_scene_concurrency MAX_SCENE_CONCURRENCY
200
- Maximum number of scenes to process concurrently
201
- --max_topic_concurrency MAX_TOPIC_CONCURRENCY
202
- Maximum number of topics to process concurrently
203
- --debug_combine_topic DEBUG_COMBINE_TOPIC
204
- Debug combine videos
205
- --only_plan Only generate scene outline and implementation plans
206
- --check_status Check planning and code status for all theorems
207
- --only_render Only render scenes without combining videos
208
- --scenes SCENES [SCENES ...]
209
- Specific scenes to process (if theorems_path is provided)
210
  ```
211
 
212
- ## Evaluation
213
- Note that Gemini and GPT4o is required for evaluation.
214
-
215
- Currently, evaluation requires a video file and a subtitle file (SRT format).
216
-
217
- Video evaluation:
218
- ```shell
219
- usage: evaluate.py [-h]
220
- [--model_text {gemini/gemini-1.5-pro-002,gemini/gemini-1.5-flash-002,gemini/gemini-2.0-flash-001,vertex_ai/gemini-1.5-flash-002,vertex_ai/gemini-1.5-pro-002,vertex_ai/gemini-2.0-flash-001,openai/o3-mini,gpt-4o,azure/gpt-4o,azure/gpt-4o-mini,bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0,bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0,bedrock/anthropic.claude-3-5-haiku-20241022-v1:0,bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0}]
221
- [--model_video {gemini/gemini-1.5-pro-002,gemini/gemini-2.0-flash-exp,gemini/gemini-2.0-pro-exp-02-05}]
222
- [--model_image {gemini/gemini-1.5-pro-002,gemini/gemini-1.5-flash-002,gemini/gemini-2.0-flash-001,vertex_ai/gemini-1.5-flash-002,vertex_ai/gemini-1.5-pro-002,vertex_ai/gemini-2.0-flash-001,openai/o3-mini,gpt-4o,azure/gpt-4o,azure/gpt-4o-mini,bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0,bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0,bedrock/anthropic.claude-3-5-haiku-20241022-v1:0,bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0}]
223
- [--eval_type {text,video,image,all}] --file_path FILE_PATH --output_folder OUTPUT_FOLDER [--retry_limit RETRY_LIMIT] [--combine] [--bulk_evaluate] [--target_fps TARGET_FPS]
224
- [--use_parent_folder_as_topic] [--max_workers MAX_WORKERS]
225
-
226
- Automatic evaluation of theorem explanation videos with LLMs
227
-
228
- options:
229
- -h, --help show this help message and exit
230
- --model_text {gemini/gemini-1.5-pro-002,gemini/gemini-1.5-flash-002,gemini/gemini-2.0-flash-001,vertex_ai/gemini-1.5-flash-002,vertex_ai/gemini-1.5-pro-002,vertex_ai/gemini-2.0-flash-001,openai/o3-mini,gpt-4o,azure/gpt-4o,azure/gpt-4o-mini,bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0,bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0,bedrock/anthropic.claude-3-5-haiku-20241022-v1:0,bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0}
231
- Select the AI model to use for text evaluation
232
- --model_video {gemini/gemini-1.5-pro-002,gemini/gemini-2.0-flash-exp,gemini/gemini-2.0-pro-exp-02-05}
233
- Select the AI model to use for video evaluation
234
- --model_image {gemini/gemini-1.5-pro-002,gemini/gemini-1.5-flash-002,gemini/gemini-2.0-flash-001,vertex_ai/gemini-1.5-flash-002,vertex_ai/gemini-1.5-pro-002,vertex_ai/gemini-2.0-flash-001,openai/o3-mini,gpt-4o,azure/gpt-4o,azure/gpt-4o-mini,bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0,bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0,bedrock/anthropic.claude-3-5-haiku-20241022-v1:0,bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0}
235
- Select the AI model to use for image evaluation
236
- --eval_type {text,video,image,all}
237
- Type of evaluation to perform
238
- --file_path FILE_PATH
239
- Path to a file or a theorem folder
240
- --output_folder OUTPUT_FOLDER
241
- Directory to store the evaluation files
242
- --retry_limit RETRY_LIMIT
243
- Number of retry attempts for each inference
244
- --combine Combine all results into a single JSON file
245
- --bulk_evaluate Evaluate a folder of theorems together
246
- --target_fps TARGET_FPS
247
- Target FPS for video processing. If not set, original video FPS will be used
248
- --use_parent_folder_as_topic
249
- Use parent folder name as topic name for single file evaluation
250
- --max_workers MAX_WORKERS
251
- Maximum number of concurrent workers for parallel processing
252
  ```
253
- * For `file_path`, it is recommended to pass a folder containing both an MP4 file and an SRT file.
254
 
255
- ## Misc: Modify the system prompt in TheoremExplainAgent
256
 
257
- If you want to modify the system prompt, you need to:
258
 
259
- 1. Modify files in `task_generator/prompts_raw` folder.
260
- 2. Run `task_generator/parse_prompt.py` to rebuild the `__init__.py` file.
 
 
261
 
262
- ```python
263
- cd task_generator
264
- python parse_prompt.py
265
- cd ..
266
- ```
267
 
268
- ## TheoremExplainBench (TEB)
 
 
 
269
 
270
- TheoremExplainBench can be found on https://huggingface.co/datasets/TIGER-Lab/TheoremExplainBench.
271
 
272
- How to use:
273
- ```python
274
- import datasets
275
- dataset = datasets.load_dataset("TIGER-Lab/TheoremExplainBench")
276
- ```
 
277
 
278
- Dataset info:
279
- ```shell
280
- DatasetDict({
281
- train: Dataset({
282
- features: ['uid', 'subject', 'difficulty', 'theorem', 'description', 'subfield'],
283
- num_rows: 240
284
- })
285
- })
286
  ```
287
 
288
- ## ❓ FAQ
 
 
 
289
 
290
- The FAQ should cover the most common errors you could encounter. If you see something new, report it on issues.
 
 
291
 
292
- Q: Error `src.utils.kokoro_voiceover import KokoroService # You MUST import like this as this is our custom voiceover service. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named 'src'`. <br>
293
- A: Please run `export PYTHONPATH=$(pwd):$PYTHONPATH` when you start a new terminal. <br>
 
294
 
295
- Q: Error `Files not found` <br>
296
- A: Check your Manim installation. <br>
297
 
298
- Q: Error `latex ...` <br>
299
- A: Check your latex installation. <br>
 
 
300
 
301
- Q: The output log is not showing response? <br>
302
- A: It could be API-related issues. Make sure your `.env` file is properly configured (fill in your API keys), or you can enable litellm debug mode to figure out the issues. <be>
303
 
304
- Q: Plans / Scenes are missing? <br>
305
- A: It could be API-related issues. Make sure your `.env` file is properly configured (fill in your API keys), or you can enable litellm debug mode to figure out the issues. <br>
 
 
306
 
 
 
 
 
307
 
308
- ## πŸ–ŠοΈ Citation
309
 
310
- Please kindly cite our paper if you use our code, data, models or results:
311
- ```bibtex
312
- @misc{ku2025theoremexplainagentmultimodalexplanationsllm,
313
- title={TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding},
314
- author={Max Ku and Thomas Chong and Jonathan Leung and Krish Shah and Alvin Yu and Wenhu Chen},
315
- year={2025},
316
- eprint={2502.19400},
317
- archivePrefix={arXiv},
318
- primaryClass={cs.AI},
319
- url={https://arxiv.org/abs/2502.19400},
320
- }
321
- ```
322
 
323
- ## 🎫 License
324
 
325
- This project is released under the [the MIT License](LICENSE).
 
 
 
326
 
327
- ## ⭐ Star History
328
 
329
- [![Star History Chart](https://api.star-history.com/svg?repos=TIGER-AI-Lab/TheoremExplainAgent&type=Date)](https://star-history.com/#TIGER-AI-Lab/TheoremExplainAgent&Date)
330
 
331
- ## πŸ’ž Acknowledgements
332
 
333
- We want to thank [Votee AI](https://votee.ai/) for sponsoring API keys to access the close-sourced models.
 
 
 
334
 
335
- The code is built upon the below repositories, we thank all the contributors for open-sourcing.
336
- * [Manim Community](https://www.manim.community/)
337
- * [kokoro-manim-voiceover](https://github.com/xposed73/kokoro-manim-voiceover)
338
- * [manim-physics](https://github.com/Matheart/manim-physics)
339
- * [manim-Chemistry](https://github.com/UnMolDeQuimica/manim-Chemistry)
340
- * [ManimML](https://github.com/helblazer811/ManimML)
341
- * [manim-dsa](https://github.com/F4bbi/manim-dsa)
342
- * [manim-circuit](https://github.com/Mr-FuzzyPenguin/manim-circuit)
343
 
344
- ## 🚨 Disclaimer
 
 
 
345
 
346
- **This work is intended for research purposes only. The authors do not encourage or endorse the use of this codebase for commercial applications. The code is provided "as is" without any warranties, and users assume all responsibility for its use.**
347
 
348
- Tested Environment: MacOS, Linux
 
1
+ ---
2
+ title: AI Animation & Voice Studio
3
+ emoji: 🎬
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: docker
7
+ app_port: 7860
8
+ suggested_hardware: cpu-upgrade
9
+ suggested_storage: small
10
+ pinned: true
11
+ license: apache-2.0
12
+ short_description: "Create mathematical animations with AI-powered using Manim"
13
+ tags:
14
+ - text-to-speech
15
+ - animation
16
+ - mathematics
17
+ - manim
18
+ - ai-voice
19
+ - educational
20
+ - visualization
21
+ models:
22
+ - kokoro-onnx/kokoro-v0_19
23
+ datasets: []
24
+ startup_duration_timeout: 30m
25
+ fullWidth: true
26
+ header: default
27
+ disable_embedding: false
28
+ preload_from_hub: []
29
+ ---
30
+
31
+ # AI Animation & Voice Studio 🎬
32
+
33
+ A powerful application that combines AI-powered text-to-speech with mathematical animation generation using Manim and Kokoro TTS. Create stunning educational content with synchronized voice narration and mathematical visualizations.
34
+
35
+ ## πŸš€ Features
36
+
37
+ - **Text-to-Speech**: High-quality voice synthesis using Kokoro ONNX models
38
+ - **Mathematical Animations**: Create stunning mathematical visualizations with Manim
39
+ - **LaTeX Support**: Full LaTeX rendering capabilities with TinyTeX
40
+ - **Interactive Interface**: User-friendly Gradio web interface
41
+ - **Audio Processing**: Advanced audio manipulation with FFmpeg and SoX
42
+
43
+ ## πŸ› οΈ Technology Stack
44
+
45
+ - **Frontend**: Gradio for interactive web interface
46
+ - **Backend**: Python with FastAPI/Flask
47
+ - **Animation**: Manim (Mathematical Animation Engine)
48
+ - **TTS**: Kokoro ONNX for text-to-speech synthesis
49
+ - **LaTeX**: TinyTeX for mathematical typesetting
50
+ - **Audio**: FFmpeg, SoX, PortAudio for audio processing
51
+ - **Deployment**: Docker container optimized for Hugging Face Spaces
52
+
53
+ ## πŸ“¦ Models
54
+
55
+ This application uses the following pre-trained models:
56
+
57
+ - **Kokoro TTS**: `kokoro-v0_19.onnx` - High-quality neural text-to-speech model
58
+ - **Voice Models**: `voices.bin` - Voice embedding models for different speaker characteristics
59
+
60
+ Models are automatically downloaded during the Docker build process from the official releases.
61
+
62
+ ## πŸƒβ€β™‚οΈ Quick Start
63
+
64
+ ### Using Hugging Face Spaces
65
+
66
+ 1. Visit the [Space](https://huggingface.co/spaces/your-username/ai-animation-voice-studio)
67
+ 2. Wait for the container to load (initial startup may take 3-5 minutes due to model loading)
68
+ 3. Upload your script or enter text directly
69
+ 4. Choose animation settings and voice parameters
70
+ 5. Generate your animated video with AI narration!
71
+
72
+ ### Local Development
73
+
74
+ ```bash
75
+ # Clone the repository
76
+ git clone https://huggingface.co/spaces/your-username/ai-animation-voice-studio
77
+ cd ai-animation-voice-studio
78
+
79
+ # Build the Docker image
80
+ docker build -t ai-animation-studio .
81
+
82
+ # Run the container
83
+ docker run -p 7860:7860 ai-animation-studio
84
+ ```
85
 
86
+ Access the application at `http://localhost:7860`
87
 
88
+ ### Environment Setup
 
 
 
89
 
90
+ Create a `.env` file with your configuration:
91
 
92
+ ```env
93
+ # Application settings
94
+ DEBUG=false
95
+ MAX_WORKERS=4
96
 
97
+ # Model settings
98
+ MODEL_PATH=/app/models
99
+ CACHE_DIR=/tmp/cache
100
 
101
+ # Optional: API keys if needed
102
+ # OPENAI_API_KEY=your_key_here
103
+ ```
104
 
105
+ ## 🎯 Usage Examples
106
 
107
+ ### Basic Text-to-Speech
108
 
109
+ ```python
110
+ # Example usage in your code
111
+ from src.tts import generate_speech
112
+
113
+ audio = generate_speech(
114
+ text="Hello, this is a test of the text-to-speech system",
115
+ voice="default",
116
+ speed=1.0
117
+ )
118
+ ```
119
 
120
+ ### Mathematical Animation
121
 
122
+ ```python
123
+ # Example Manim scene
124
+ from manim import *
125
 
126
+ class Example(Scene):
127
+ def construct(self):
128
+ # Your animation code here
129
+ pass
 
130
  ```
131
 
132
+ ## πŸ“ Project Structure
133
+
134
+ ```
135
+ β”œβ”€β”€ src/ # Source code
136
+ β”‚ β”œβ”€β”€ tts/ # Text-to-speech modules
137
+ β”‚ β”œβ”€β”€ manim_scenes/ # Manim animation scenes
138
+ β”‚ └── utils/ # Utility functions
139
+ β”œβ”€β”€ models/ # Pre-trained models (auto-downloaded)
140
+ β”œβ”€β”€ output/ # Generated content output
141
+ β”œβ”€β”€ requirements.txt # Python dependencies
142
+ β”œβ”€β”€ Dockerfile # Container configuration
143
+ β”œβ”€β”€ gradio_app.py # Main application entry point
144
+ └── README.md # This file
145
  ```
146
 
147
+ ## βš™οΈ Configuration
148
 
149
+ ### Docker Environment Variables
 
 
150
 
151
+ - `GRADIO_SERVER_NAME`: Server host (default: 0.0.0.0)
152
+ - `GRADIO_SERVER_PORT`: Server port (default: 7860)
153
+ - `PYTHONPATH`: Python path configuration
154
+ - `HF_HOME`: Hugging Face cache directory
155
 
156
+ ### Application Settings
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
 
158
+ Modify settings in your `.env` file or through environment variables:
 
 
 
159
 
160
+ - Model parameters
161
+ - Audio quality settings
162
+ - Animation render settings
163
+ - Cache configurations
164
 
165
+ ## πŸ”§ Development
166
 
167
+ ### Prerequisites
168
 
169
+ - Docker and Docker Compose
170
+ - Python 3.12+
171
+ - Git
172
 
173
+ ### Setting Up Development Environment
 
 
 
 
 
 
 
 
174
 
175
+ ```bash
176
+ # Install dependencies locally for development
177
+ pip install -r requirements.txt
 
 
 
 
 
 
178
 
179
+ # Run tests (if available)
180
+ python -m pytest tests/
 
 
 
 
 
 
 
 
181
 
182
+ # Format code
183
+ black .
184
+ isort .
 
 
 
 
 
 
 
 
 
 
 
 
185
 
186
+ # Lint code
187
+ flake8 .
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
188
  ```
189
 
190
+ ### Building and Testing
191
+
192
+ ```bash
193
+ # Build the Docker image
194
+ docker build -t your-app-name:dev .
195
+
196
+ # Test the container locally
197
+ docker run --rm -p 7860:7860 your-app-name:dev
198
+
199
+ # Check container health
200
+ docker run --rm your-app-name:dev python -c "import src; print('Import successful')"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
201
  ```
 
202
 
203
+ ## πŸ“Š Performance & Hardware
204
 
205
+ ### Recommended Specs for Hugging Face Spaces
206
 
207
+ - **Hardware**: `cpu-upgrade` (recommended for faster rendering)
208
+ - **Storage**: `small` (sufficient for models and temporary files)
209
+ - **Startup Time**: ~3-5 minutes (due to model loading and TinyTeX setup)
210
+ - **Memory Usage**: ~2-3GB during operation
211
 
212
+ ### System Requirements
 
 
 
 
213
 
214
+ - **Memory**: Minimum 2GB RAM, Recommended 4GB+
215
+ - **CPU**: Multi-core processor recommended for faster animation rendering
216
+ - **Storage**: ~1.5GB for models and dependencies
217
+ - **Network**: Stable connection for initial model downloads
218
 
219
+ ### Optimization Tips
220
 
221
+ - Models are cached after first download
222
+ - Gradio interface uses efficient streaming for large outputs
223
+ - Docker multi-stage builds minimize final image size
224
+ - TinyTeX installation is optimized for essential packages only
225
+
226
+ ## πŸ› Troubleshooting
227
 
228
+ ### Common Issues
229
+
230
+ **Build Failures**:
231
+ ```bash
232
+ # Clear Docker cache if build fails
233
+ docker system prune -a
234
+ docker build --no-cache -t your-app-name .
 
235
  ```
236
 
237
+ **Model Download Issues**:
238
+ - Check internet connection
239
+ - Verify model URLs are accessible
240
+ - Models will be re-downloaded if corrupted
241
 
242
+ **Memory Issues**:
243
+ - Reduce batch sizes in configuration
244
+ - Monitor memory usage with `docker stats`
245
 
246
+ **Audio Issues**:
247
+ - Ensure audio drivers are properly installed
248
+ - Check PortAudio configuration
249
 
250
+ ### Getting Help
 
251
 
252
+ 1. Check the [Discussions](https://huggingface.co/spaces/your-username/ai-animation-voice-studio/discussions) tab
253
+ 2. Review container logs in the Space settings
254
+ 3. Enable debug mode in configuration
255
+ 4. Report issues in the Community tab
256
 
257
+ ### Common Configuration Issues
 
258
 
259
+ **Space Configuration**:
260
+ - Ensure `app_port: 7860` is set in README.md front matter
261
+ - Check that `sdk: docker` is properly configured
262
+ - Verify hardware suggestions match your needs
263
 
264
+ **Model Loading**:
265
+ - Models download automatically on first run
266
+ - Check Space logs for download progress
267
+ - Restart Space if models fail to load
268
 
269
+ ## 🀝 Contributing
270
 
271
+ We welcome contributions! Please see our contributing guidelines:
272
+
273
+ 1. Fork the repository
274
+ 2. Create a feature branch
275
+ 3. Make your changes
276
+ 4. Add tests if applicable
277
+ 5. Submit a pull request
 
 
 
 
 
278
 
279
+ ### Code Style
280
 
281
+ - Follow PEP 8 for Python code
282
+ - Use Black for code formatting
283
+ - Add docstrings for functions and classes
284
+ - Include type hints where appropriate
285
 
286
+ ## πŸ“„ License
287
 
288
+ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
289
 
290
+ ## πŸ™ Acknowledgments
291
 
292
+ - [Manim Community](https://www.manim.community/) for the animation engine
293
+ - [Kokoro TTS](https://github.com/thewh1teagle/kokoro-onnx) for text-to-speech models
294
+ - [Gradio](https://gradio.app/) for the web interface framework
295
+ - [Hugging Face](https://huggingface.co/) for hosting and infrastructure
296
 
297
+ ## πŸ“ž Contact
 
 
 
 
 
 
 
298
 
299
+ - **Author**: Your Name
300
+ - **Email**: [email protected]
301
+ - **GitHub**: [@your-username](https://github.com/your-username)
302
+ - **Hugging Face**: [@your-username](https://huggingface.co/your-username)
303
 
304
+ ---
305
 
306
+ *Built with ❀️ for the open-source community*