tencent
/

HunyuanVideo-Foley

@@ -18,7 +18,7 @@ extra_gated_eu_disallowed: true
 <h1>🎬 HunyuanVideo-Foley </h1>
-<h3>Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation</h3>
 <p align="center">
   <strong>Professional-grade AI sound effect generation for video content creators</strong>
@@ -107,7 +107,7 @@ Professional-grade audio generation with crystal clarity
 <div align="center" style="background: linear-gradient(135deg, #ffeef8 0%, #f0f8ff 100%); padding: 30px; border-radius: 20px; margin: 20px 0; border-left: 5px solid #ff6b9d; color: #333;">
-**🚀 Tencent Hunyuan** proudly open-sources **HunyuanVideo-Foley** - an end-to-end video sound effect generation model!
 *A professional-grade AI tool specifically designed for video content creators, widely applicable to diverse scenarios including short video creation, film production, advertising creativity, and game development.*
@@ -217,7 +217,7 @@ The **TV2A (Text-Video-to-Audio)** task presents a complex multimodal generation
 | Frieren | 5.71 | 2.81 | 3.47 | 5.31 | 0.18 | 1.39 | 0.16 | 2.92±0.95 | 2.76±1.20 | 2.94±1.26 |
 | MMAudio | 6.17 | 2.84 | 3.59 | 5.62 | 0.27 | 0.80 | 0.35 | 3.58±0.84 | 3.63±1.00 | 3.47±1.03 |
 | ThinkSound | 6.04 | 3.73 | 3.81 | 5.59 | 0.18 | 0.91 | 0.20 | 3.20±0.97 | 3.01±1.04 | 3.02±1.08 |
-| **🥇 HiFi-Foley (ours)** | **🟢 6.59** | **🟢 2.74** | **🟢 3.88** | **🟢 6.13** | **🟢 0.35** | **🟢 0.74** | **🟢 0.33** | **🟢 4.14±0.68** | **🟢 4.12±0.77** | **🟢 4.15±0.75** |
 </div>
@@ -239,7 +239,7 @@ The **TV2A (Text-Video-to-Audio)** task presents a complex multimodal generation
 | Frieren | 16.86 | 293.57 | 2.95 | 7.32 | 5.72 | 2.55 | 2.88 | 5.10 | 0.21 | 0.86 | 0.16 |
 | MMAudio | 9.01 | 205.85 | 2.17 | 9.59 | 5.94 | 2.91 | 3.30 | 5.39 | 0.30 | 0.56 | 0.27 |
 | ThinkSound | 9.92 | 228.68 | 2.39 | 6.86 | 5.78 | 3.23 | 3.12 | 5.11 | 0.22 | 0.67 | 0.22 |
-| **🥇 HiFi-Foley (ours)** | **🟢 6.07** | **🟢 202.12** | **🟢 1.89** | **🟢 8.30** | **🟢 6.12** | **🟢 2.76** | **🟢 3.22** | **🟢 5.53** | **🟢 0.38** | **🟢 0.54** | **🟢 0.24** |
 </div>

 <h1>🎬 HunyuanVideo-Foley </h1>
+<h4>Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation</h4>
 <p align="center">
   <strong>Professional-grade AI sound effect generation for video content creators</strong>
 <div align="center" style="background: linear-gradient(135deg, #ffeef8 0%, #f0f8ff 100%); padding: 30px; border-radius: 20px; margin: 20px 0; border-left: 5px solid #ff6b9d; color: #333;">
+**🚀 Tencent Hunyuan** open-sources **HunyuanVideo-Foley** an end-to-end video sound effect generation model!
 *A professional-grade AI tool specifically designed for video content creators, widely applicable to diverse scenarios including short video creation, film production, advertising creativity, and game development.*
 | Frieren | 5.71 | 2.81 | 3.47 | 5.31 | 0.18 | 1.39 | 0.16 | 2.92±0.95 | 2.76±1.20 | 2.94±1.26 |
 | MMAudio | 6.17 | 2.84 | 3.59 | 5.62 | 0.27 | 0.80 | 0.35 | 3.58±0.84 | 3.63±1.00 | 3.47±1.03 |
 | ThinkSound | 6.04 | 3.73 | 3.81 | 5.59 | 0.18 | 0.91 | 0.20 | 3.20±0.97 | 3.01±1.04 | 3.02±1.08 |
+| **HunyuanVideo-Foley (ours)** | **6.59** | **2.74** | **3.88** | **6.13** | **0.35** | **0.74** | **0.33** | **4.14±0.68** | **4.12±0.77** | **4.15±0.75** |
 </div>
 | Frieren | 16.86 | 293.57 | 2.95 | 7.32 | 5.72 | 2.55 | 2.88 | 5.10 | 0.21 | 0.86 | 0.16 |
 | MMAudio | 9.01 | 205.85 | 2.17 | 9.59 | 5.94 | 2.91 | 3.30 | 5.39 | 0.30 | 0.56 | 0.27 |
 | ThinkSound | 9.92 | 228.68 | 2.39 | 6.86 | 5.78 | 3.23 | 3.12 | 5.11 | 0.22 | 0.67 | 0.22 |
+| **HunyuanVideo-Foley (ours)** | **6.07** | **202.12** | **1.89** | **8.30** | **6.12** | **2.76** | **3.22** | **5.53** | **0.38** | **0.54** | **0.24** |
 </div>