James Zhou
commited on
Commit
·
f58903b
1
Parent(s):
2e936cf
[update] readme
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ extra_gated_eu_disallowed: true
|
|
18 |
|
19 |
<h1>🎬 HunyuanVideo-Foley </h1>
|
20 |
|
21 |
-
<
|
22 |
|
23 |
<p align="center">
|
24 |
<strong>Professional-grade AI sound effect generation for video content creators</strong>
|
@@ -107,7 +107,7 @@ Professional-grade audio generation with crystal clarity
|
|
107 |
|
108 |
<div align="center" style="background: linear-gradient(135deg, #ffeef8 0%, #f0f8ff 100%); padding: 30px; border-radius: 20px; margin: 20px 0; border-left: 5px solid #ff6b9d; color: #333;">
|
109 |
|
110 |
-
**🚀 Tencent Hunyuan**
|
111 |
|
112 |
*A professional-grade AI tool specifically designed for video content creators, widely applicable to diverse scenarios including short video creation, film production, advertising creativity, and game development.*
|
113 |
|
@@ -217,7 +217,7 @@ The **TV2A (Text-Video-to-Audio)** task presents a complex multimodal generation
|
|
217 |
| Frieren | 5.71 | 2.81 | 3.47 | 5.31 | 0.18 | 1.39 | 0.16 | 2.92±0.95 | 2.76±1.20 | 2.94±1.26 |
|
218 |
| MMAudio | 6.17 | 2.84 | 3.59 | 5.62 | 0.27 | 0.80 | 0.35 | 3.58±0.84 | 3.63±1.00 | 3.47±1.03 |
|
219 |
| ThinkSound | 6.04 | 3.73 | 3.81 | 5.59 | 0.18 | 0.91 | 0.20 | 3.20±0.97 | 3.01±1.04 | 3.02±1.08 |
|
220 |
-
|
|
221 |
|
222 |
</div>
|
223 |
|
@@ -239,7 +239,7 @@ The **TV2A (Text-Video-to-Audio)** task presents a complex multimodal generation
|
|
239 |
| Frieren | 16.86 | 293.57 | 2.95 | 7.32 | 5.72 | 2.55 | 2.88 | 5.10 | 0.21 | 0.86 | 0.16 |
|
240 |
| MMAudio | 9.01 | 205.85 | 2.17 | 9.59 | 5.94 | 2.91 | 3.30 | 5.39 | 0.30 | 0.56 | 0.27 |
|
241 |
| ThinkSound | 9.92 | 228.68 | 2.39 | 6.86 | 5.78 | 3.23 | 3.12 | 5.11 | 0.22 | 0.67 | 0.22 |
|
242 |
-
|
|
243 |
|
244 |
</div>
|
245 |
|
|
|
18 |
|
19 |
<h1>🎬 HunyuanVideo-Foley </h1>
|
20 |
|
21 |
+
<h4>Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation</h4>
|
22 |
|
23 |
<p align="center">
|
24 |
<strong>Professional-grade AI sound effect generation for video content creators</strong>
|
|
|
107 |
|
108 |
<div align="center" style="background: linear-gradient(135deg, #ffeef8 0%, #f0f8ff 100%); padding: 30px; border-radius: 20px; margin: 20px 0; border-left: 5px solid #ff6b9d; color: #333;">
|
109 |
|
110 |
+
**🚀 Tencent Hunyuan** open-sources **HunyuanVideo-Foley** an end-to-end video sound effect generation model!
|
111 |
|
112 |
*A professional-grade AI tool specifically designed for video content creators, widely applicable to diverse scenarios including short video creation, film production, advertising creativity, and game development.*
|
113 |
|
|
|
217 |
| Frieren | 5.71 | 2.81 | 3.47 | 5.31 | 0.18 | 1.39 | 0.16 | 2.92±0.95 | 2.76±1.20 | 2.94±1.26 |
|
218 |
| MMAudio | 6.17 | 2.84 | 3.59 | 5.62 | 0.27 | 0.80 | 0.35 | 3.58±0.84 | 3.63±1.00 | 3.47±1.03 |
|
219 |
| ThinkSound | 6.04 | 3.73 | 3.81 | 5.59 | 0.18 | 0.91 | 0.20 | 3.20±0.97 | 3.01±1.04 | 3.02±1.08 |
|
220 |
+
| **HunyuanVideo-Foley (ours)** | **6.59** | **2.74** | **3.88** | **6.13** | **0.35** | **0.74** | **0.33** | **4.14±0.68** | **4.12±0.77** | **4.15±0.75** |
|
221 |
|
222 |
</div>
|
223 |
|
|
|
239 |
| Frieren | 16.86 | 293.57 | 2.95 | 7.32 | 5.72 | 2.55 | 2.88 | 5.10 | 0.21 | 0.86 | 0.16 |
|
240 |
| MMAudio | 9.01 | 205.85 | 2.17 | 9.59 | 5.94 | 2.91 | 3.30 | 5.39 | 0.30 | 0.56 | 0.27 |
|
241 |
| ThinkSound | 9.92 | 228.68 | 2.39 | 6.86 | 5.78 | 3.23 | 3.12 | 5.11 | 0.22 | 0.67 | 0.22 |
|
242 |
+
| **HunyuanVideo-Foley (ours)** | **6.07** | **202.12** | **1.89** | **8.30** | **6.12** | **2.76** | **3.22** | **5.53** | **0.38** | **0.54** | **0.24** |
|
243 |
|
244 |
</div>
|
245 |
|