FastVideo
/

FastMochi-diffusers

Model card Files Files and versions

PY007 commited on Dec 16, 2024

Commit

e4d823f

·

verified ·

1 Parent(s): 54272b5

Upload folder using huggingface_hub

Files changed (2) hide show

README.md +96 -0
assets/logo.jpg +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,96 @@

+---
+language:
+ - "en"
+tags:
+ - video
+license: apache-2.0
+pipeline_tag: text-to-video
+library_name: diffusers
+---
+<p align="center">
+  <img src="assets/logo.jpg"  height=30>
+</p>
+# FastMochi Model Card
+## Model Details
+FastMochi is an accelerated [Mochi](https://huggingface.co/genmo/mochi-1-preview) model. It can sample high quality videos with 8 diffusion steps. That brings around 8X speed up compared to the original Mochu with 64 steps.
+- **Developed by**: [Hao AI Lab](https://hao-ai-lab.github.io/)
+- **License**:  Apache-2.0
+- **Distilled from**: [Mochi](https://huggingface.co/genmo/mochi-1-preview)
+- **Github Repository**: https://github.com/hao-ai-lab/FastVideo
+## Usage
+- Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README.
+- You can also run FastMochi using the official [Mochi repository](https://github.com/Tencent/HunyuanVideo) with the script below and this [compatible weight](https://huggingface.co/FastVideo/FastMochi).
+<details>
+  <summary>Code</summary>
+```python
+from genmo.mochi_preview.pipelines import (
+    DecoderModelFactory,
+    DitModelFactory,
+    MochiMultiGPUPipeline,
+    T5ModelFactory,
+    linear_quadratic_schedule,
+)
+from genmo.lib.utils import save_video
+import os
+with open("prompt.txt", "r") as f:
+    prompts = [line.rstrip() for line in f]
+pipeline = MochiMultiGPUPipeline(
+    text_encoder_factory=T5ModelFactory(),
+    world_size=4,
+    dit_factory=DitModelFactory(
+        model_path=f"weights/dit.safetensors", model_dtype="bf16"
+    ),
+    decoder_factory=DecoderModelFactory(
+        model_path=f"weights/decoder.safetensors",
+    ),
+)
+# read prompt line by line from prompt.txt
+output_dir = "outputs"
+os.makedirs(output_dir, exist_ok=True)
+for i, prompt in enumerate(prompts):
+    video = pipeline(
+        height=480,
+        width=848,
+        num_frames=163,
+        num_inference_steps=8,
+        sigma_schedule=linear_quadratic_schedule(8, 0.1, 6),
+        cfg_schedule=[1.5] * 8,
+        batch_cfg=False,
+        prompt=prompt,
+        negative_prompt="",
+        seed=12345,
+    )[0]
+    save_video(video, f"{output_dir}/output_{i}.mp4")
+```
+</details>
+## Training details
+FastMochi is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters:
+- Batch size: 32
+- Resulotion: 480X848
+- Num of frames: 169
+- Train steps: 128
+- GPUs: 16
+- LR: 1e-6
+- Loss: huber
+## Evaluation
+We provide some qualitative comparison between FastMochi 8 step inference v.s. the original Mochi with 8 step inference:

assets/logo.jpg ADDED Viewed