ginipick commited on
Commit
8973713
Β·
verified Β·
1 Parent(s): 34e268c

Create components.py

Browse files
Files changed (1) hide show
  1. ui/components.py +1591 -0
ui/components.py ADDED
@@ -0,0 +1,1591 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ACE-Step: A Step Towards Music Generation Foundation Model
3
+
4
+ https://github.com/ace-step/ACE-Step
5
+
6
+ Apache 2.0 License
7
+ """
8
+
9
+ import gradio as gr
10
+ import librosa
11
+ import os
12
+ import random
13
+ import hashlib
14
+ import numpy as np
15
+ import json
16
+ from typing import Dict, List, Tuple, Optional
17
+ from openai import OpenAI
18
+
19
+ # OpenAI ν΄λΌμ΄μ–ΈνŠΈ μ΄ˆκΈ°ν™”
20
+ try:
21
+ client = OpenAI(api_key=os.getenv("LLM_API"))
22
+ except:
23
+ client = None
24
+
25
+ TAG_DEFAULT = "funk, pop, soul, rock, melodic, guitar, drums, bass, keyboard, percussion, 105 BPM, energetic, upbeat, groovy, vibrant, dynamic, duet, male and female vocals"
26
+ LYRIC_DEFAULT = """[verse - male]
27
+ Neon lights they flicker bright
28
+ City hums in dead of night
29
+ Rhythms pulse through concrete veins
30
+ Lost in echoes of refrains
31
+
32
+ [verse - female]
33
+ Bassline groovin' in my chest
34
+ Heartbeats match the city's zest
35
+ Electric whispers fill the air
36
+ Synthesized dreams everywhere
37
+
38
+ [chorus - duet]
39
+ Turn it up and let it flow
40
+ Feel the fire let it grow
41
+ In this rhythm we belong
42
+ Hear the night sing out our song
43
+
44
+ [verse - male]
45
+ Guitar strings they start to weep
46
+ Wake the soul from silent sleep
47
+ Every note a story told
48
+ In this night we're bold and gold
49
+
50
+ [bridge - female]
51
+ Voices blend in harmony
52
+ Lost in pure cacophony
53
+ Timeless echoes timeless cries
54
+ Soulful shouts beneath the skies
55
+
56
+ [verse - duet]
57
+ Keyboard dances on the keys
58
+ Melodies on evening breeze
59
+ Catch the tune and hold it tight
60
+ In this moment we take flight
61
+ """
62
+
63
+ # ν™•μž₯된 μž₯λ₯΄ 프리셋 (κΈ°μ‘΄ + κ°œμ„ λœ νƒœκ·Έ)
64
+ GENRE_PRESETS = {
65
+ "Modern Pop": "pop, synth, drums, guitar, 120 bpm, upbeat, catchy, vibrant, duet vocals, polished vocals, radio-ready, commercial, layered vocals",
66
+ "Rock": "rock, electric guitar, drums, bass, 130 bpm, energetic, rebellious, gritty, powerful vocals, raw vocals, power chords, driving rhythm",
67
+ "Hip Hop": "hip hop, 808 bass, hi-hats, synth, 90 bpm, bold, urban, intense, rhythmic vocals, trap beats, punchy drums",
68
+ "Country": "country, acoustic guitar, steel guitar, fiddle, 100 bpm, heartfelt, rustic, warm, twangy vocals, storytelling, americana",
69
+ "EDM": "edm, synth, bass, kick drum, 128 bpm, euphoric, pulsating, energetic, instrumental, progressive build, festival anthem, electronic",
70
+ "Reggae": "reggae, guitar, bass, drums, 80 bpm, chill, soulful, positive, smooth vocals, offbeat rhythm, island vibes",
71
+ "Classical": "classical, orchestral, strings, piano, 60 bpm, elegant, emotive, timeless, instrumental, dynamic range, sophisticated harmony",
72
+ "Jazz": "jazz, saxophone, piano, double bass, 110 bpm, smooth, improvisational, soulful, crooning vocals, swing feel, sophisticated",
73
+ "Metal": "metal, electric guitar, double kick drum, bass, 160 bpm, aggressive, intense, heavy, powerful vocals, distorted, powerful",
74
+ "R&B": "r&b, synth, bass, drums, 85 bpm, sultry, groovy, romantic, silky vocals, smooth production, neo-soul",
75
+ "K-Pop": "k-pop, synth, bass, drums, 128 bpm, catchy, energetic, polished, mixed vocals, electronic elements, danceable",
76
+ "Ballad": "ballad, piano, strings, acoustic guitar, 70 bpm, emotional, heartfelt, romantic, expressive vocals, orchestral arrangement"
77
+ }
78
+
79
+ # 곑 μŠ€νƒ€μΌ μ˜΅μ…˜
80
+ SONG_STYLES = {
81
+ "λ“€μ—£ (남녀 ν˜Όμ„±)": "duet, male and female vocals, harmonious, call and response",
82
+ "μ†”λ‘œ (남성)": "solo, male vocals, powerful voice",
83
+ "μ†”λ‘œ (μ—¬μ„±)": "solo, female vocals, emotional voice",
84
+ "κ·Έλ£Ή (ν˜Όμ„±)": "group vocals, mixed gender, layered harmonies",
85
+ "ν•©μ°½": "choir, multiple voices, choral arrangement",
86
+ "랩/νž™ν•©": "rap vocals, rhythmic flow, urban style",
87
+ "μΈμŠ€νŠΈλ£¨λ©˜νƒˆ": "instrumental, no vocals"
88
+ }
89
+
90
+ # AI μž‘μ‚¬ μ‹œμŠ€ν…œ ν”„λ‘¬ν”„νŠΈ
91
+ LYRIC_SYSTEM_PROMPT = """λ„ˆλŠ” λ…Έλž˜ 가사λ₯Ό μž‘μ‚¬ν•˜λŠ” μ „λ¬Έκ°€ 역할이닀. μ΄μš©μžκ°€ μž…λ ₯ν•˜λŠ” μ£Όμ œμ™€ μŠ€νƒ€μΌμ— 따라 κ΄€λ ¨λœ λ…Έλž˜ 가사λ₯Ό μž‘μ„±ν•˜λΌ.
92
+
93
+ 가사 μž‘μ„± κ·œμΉ™:
94
+ 1. ꡬ쑰 νƒœκ·ΈλŠ” λ°˜λ“œμ‹œ "[ ]"둜 κ΅¬λΆ„ν•œλ‹€
95
+ 2. μ‚¬μš© κ°€λŠ₯ν•œ ꡬ쑰 νƒœκ·Έ: [verse], [chorus], [bridge], [intro], [outro], [pre-chorus]
96
+ 3. 듀엣인 경우 [verse - male], [verse - female], [chorus - duet] ν˜•μ‹μœΌλ‘œ 파트λ₯Ό λͺ…μ‹œν•œλ‹€
97
+ 4. μž…λ ₯ 언어와 λ™μΌν•œ μ–Έμ–΄λ‘œ 가사λ₯Ό μž‘μ„±ν•œλ‹€
98
+ 5. 각 κ΅¬μ‘°λŠ” 4-8쀄 μ •λ„λ‘œ μž‘μ„±ν•œλ‹€
99
+ 6. μŒμ•… μž₯λ₯΄μ™€ λΆ„μœ„κΈ°μ— λ§žλŠ” 가사λ₯Ό μž‘μ„±ν•œλ‹€
100
+
101
+ μ˜ˆμ‹œ ν˜•μ‹:
102
+ [verse - male]
103
+ 첫 번째 ꡬ절 가사
104
+ 두 번째 ꡬ절 가사
105
+ ...
106
+
107
+ [chorus - duet]
108
+ 후렴ꡬ 가사
109
+ ...
110
+ """
111
+
112
+ def generate_lyrics_with_ai(prompt: str, genre: str, song_style: str, language: str = "auto") -> str:
113
+ """AIλ₯Ό μ‚¬μš©ν•˜μ—¬ 가사 생성"""
114
+ if not client:
115
+ return LYRIC_DEFAULT
116
+
117
+ try:
118
+ # μ–Έμ–΄ 감지 및 μŠ€νƒ€μΌ 정보 μΆ”κ°€
119
+ style_info = ""
120
+ if "λ“€μ—£" in song_style:
121
+ style_info = "남녀 λ“€μ—£ ν˜•μ‹μœΌλ‘œ 파트λ₯Ό λ‚˜λˆ„μ–΄ μž‘μ„±ν•΄μ£Όμ„Έμš”."
122
+ elif "μ†”λ‘œ (남성)" in song_style:
123
+ style_info = "남성 μ†”λ‘œ κ°€μˆ˜λ₯Ό μœ„ν•œ 가사λ₯Ό μž‘μ„±οΏ½οΏ½μ£Όμ„Έμš”."
124
+ elif "μ†”λ‘œ (μ—¬μ„±)" in song_style:
125
+ style_info = "μ—¬μ„± μ†”λ‘œ κ°€μˆ˜λ₯Ό μœ„ν•œ 가사λ₯Ό μž‘μ„±ν•΄μ£Όμ„Έμš”."
126
+ elif "κ·Έλ£Ή" in song_style:
127
+ style_info = "그룹이 λΆ€λ₯΄λŠ” ν˜•μ‹μœΌλ‘œ 파트λ₯Ό λ‚˜λˆ„μ–΄ μž‘μ„±ν•΄μ£Όμ„Έμš”."
128
+
129
+ user_prompt = f"""
130
+ 주제: {prompt}
131
+ μž₯λ₯΄: {genre}
132
+ μŠ€νƒ€μΌ: {style_info}
133
+
134
+ μœ„ 정보λ₯Ό λ°”νƒ•μœΌλ‘œ λ…Έλž˜ 가사λ₯Ό μž‘μ„±ν•΄μ£Όμ„Έμš”. μž…λ ₯된 언어와 λ™μΌν•œ μ–Έμ–΄λ‘œ μž‘μ„±ν•˜κ³ , ꡬ쑰 νƒœκ·Έλ₯Ό λ°˜λ“œμ‹œ ν¬ν•¨ν•΄μ£Όμ„Έμš”.
135
+ """
136
+
137
+ response = client.chat.completions.create(
138
+ model="gpt-4-mini",
139
+ messages=[
140
+ {"role": "system", "content": LYRIC_SYSTEM_PROMPT},
141
+ {"role": "user", "content": user_prompt}
142
+ ],
143
+ temperature=0.8,
144
+ max_tokens=1000
145
+ )
146
+
147
+ return response.choices[0].message.content
148
+ except Exception as e:
149
+ print(f"AI 가사 생성 였λ₯˜: {e}")
150
+ return LYRIC_DEFAULT
151
+
152
+ # ν’ˆμ§ˆ 프리셋 μ‹œμŠ€ν…œ μΆ”κ°€
153
+ QUALITY_PRESETS = {
154
+ "Draft (Fast)": {
155
+ "infer_step": 50,
156
+ "guidance_scale": 10.0,
157
+ "scheduler_type": "euler",
158
+ "omega_scale": 5.0,
159
+ "use_erg_diffusion": False,
160
+ "use_erg_tag": True,
161
+ "description": "λΉ λ₯Έ μ΄ˆμ•ˆ 생성 (1-2λΆ„)"
162
+ },
163
+ "Standard": {
164
+ "infer_step": 150,
165
+ "guidance_scale": 15.0,
166
+ "scheduler_type": "euler",
167
+ "omega_scale": 10.0,
168
+ "use_erg_diffusion": True,
169
+ "use_erg_tag": True,
170
+ "description": "ν‘œμ€€ ν’ˆμ§ˆ (3-5λΆ„)"
171
+ },
172
+ "High Quality": {
173
+ "infer_step": 200,
174
+ "guidance_scale": 18.0,
175
+ "scheduler_type": "heun",
176
+ "omega_scale": 15.0,
177
+ "use_erg_diffusion": True,
178
+ "use_erg_tag": True,
179
+ "description": "κ³ ν’ˆμ§ˆ 생성 (8-12λΆ„)"
180
+ },
181
+ "Ultra (Best)": {
182
+ "infer_step": 299,
183
+ "guidance_scale": 20.0,
184
+ "scheduler_type": "heun",
185
+ "omega_scale": 20.0,
186
+ "use_erg_diffusion": True,
187
+ "use_erg_tag": True,
188
+ "description": "졜고 ν’ˆμ§ˆ (15-20λΆ„)"
189
+ }
190
+ }
191
+
192
+ # 닀쀑 μ‹œλ“œ 생성 μ„€μ •
193
+ MULTI_SEED_OPTIONS = {
194
+ "Single": 1,
195
+ "Best of 3": 3,
196
+ "Best of 5": 5,
197
+ "Best of 10": 10
198
+ }
199
+
200
+ class MusicGenerationCache:
201
+ """생성 κ²°κ³Ό 캐싱 μ‹œμŠ€ν…œ"""
202
+ def __init__(self):
203
+ self.cache = {}
204
+ self.max_cache_size = 50
205
+
206
+ def get_cache_key(self, params):
207
+ # μ€‘μš”ν•œ νŒŒλΌλ―Έν„°λ§ŒμœΌλ‘œ ν•΄μ‹œ 생성
208
+ key_params = {k: v for k, v in params.items()
209
+ if k in ['prompt', 'lyrics', 'infer_step', 'guidance_scale', 'audio_duration']}
210
+ return hashlib.md5(str(sorted(key_params.items())).encode()).hexdigest()[:16]
211
+
212
+ def get_cached_result(self, params):
213
+ key = self.get_cache_key(params)
214
+ return self.cache.get(key)
215
+
216
+ def cache_result(self, params, result):
217
+ if len(self.cache) >= self.max_cache_size:
218
+ oldest_key = next(iter(self.cache))
219
+ del self.cache[oldest_key]
220
+
221
+ key = self.get_cache_key(params)
222
+ self.cache[key] = result
223
+
224
+ # μ „μ—­ μΊμ‹œ μΈμŠ€ν„΄μŠ€
225
+ generation_cache = MusicGenerationCache()
226
+
227
+ def enhance_prompt_with_genre(base_prompt: str, genre: str, song_style: str) -> str:
228
+ """μž₯λ₯΄μ™€ μŠ€νƒ€μΌμ— λ”°λ₯Έ 슀마트 ν”„λ‘¬ν”„νŠΈ ν™•μž₯"""
229
+ if genre == "Custom" or not genre:
230
+ enhanced_prompt = base_prompt
231
+ else:
232
+ # μž₯λ₯΄λ³„ μΆ”κ°€ κ°œμ„  νƒœκ·Έ
233
+ genre_enhancements = {
234
+ "Modern Pop": ["polished production", "mainstream appeal", "hook-driven"],
235
+ "Rock": ["guitar-driven", "powerful drums", "energetic performance"],
236
+ "Hip Hop": ["rhythmic flow", "urban atmosphere", "bass-heavy"],
237
+ "Country": ["acoustic warmth", "storytelling melody", "authentic feel"],
238
+ "EDM": ["electronic atmosphere", "build-ups", "dance-friendly"],
239
+ "Reggae": ["laid-back groove", "tropical vibes", "rhythmic guitar"],
240
+ "Classical": ["orchestral depth", "musical sophistication", "timeless beauty"],
241
+ "Jazz": ["musical complexity", "improvisational spirit", "sophisticated harmony"],
242
+ "Metal": ["aggressive energy", "powerful sound", "intense atmosphere"],
243
+ "R&B": ["smooth groove", "soulful expression", "rhythmic sophistication"],
244
+ "K-Pop": ["catchy hooks", "dynamic arrangement", "polished production"],
245
+ "Ballad": ["emotional depth", "slow tempo", "heartfelt delivery"]
246
+ }
247
+
248
+ if genre in genre_enhancements:
249
+ additional_tags = ", ".join(genre_enhancements[genre])
250
+ enhanced_prompt = f"{base_prompt}, {additional_tags}"
251
+ else:
252
+ enhanced_prompt = base_prompt
253
+
254
+ # μŠ€νƒ€μΌ νƒœκ·Έ μΆ”κ°€
255
+ if song_style in SONG_STYLES:
256
+ style_tags = SONG_STYLES[song_style]
257
+ enhanced_prompt = f"{enhanced_prompt}, {style_tags}"
258
+
259
+ return enhanced_prompt
260
+
261
+ def calculate_quality_score(audio_path: str) -> float:
262
+ """κ°„λ‹¨ν•œ ν’ˆμ§ˆ 점수 계산 (μ‹€μ œ κ΅¬ν˜„μ—μ„œλŠ” 더 λ³΅μž‘ν•œ λ©”νŠΈλ¦­ μ‚¬μš©)"""
263
+ try:
264
+ y, sr = librosa.load(audio_path)
265
+
266
+ # κΈ°λ³Έ ν’ˆμ§ˆ λ©”νŠΈλ¦­
267
+ rms_energy = np.sqrt(np.mean(y**2))
268
+ spectral_centroid = np.mean(librosa.feature.spectral_centroid(y=y, sr=sr))
269
+ zero_crossing_rate = np.mean(librosa.feature.zero_crossing_rate(y))
270
+
271
+ # μ •κ·œν™”λœ 점수 (0-100)
272
+ energy_score = min(rms_energy * 1000, 40) # 0-40점
273
+ spectral_score = min(spectral_centroid / 100, 40) # 0-40점
274
+ clarity_score = min((1 - zero_crossing_rate) * 20, 20) # 0-20점
275
+
276
+ total_score = energy_score + spectral_score + clarity_score
277
+ return round(total_score, 1)
278
+ except:
279
+ return 50.0 # κΈ°λ³Έκ°’
280
+
281
+ def update_tags_from_preset(preset_name):
282
+ if preset_name == "Custom":
283
+ return ""
284
+ return GENRE_PRESETS.get(preset_name, "")
285
+
286
+ def update_quality_preset(preset_name):
287
+ """ν’ˆμ§ˆ 프리셋 적용"""
288
+ if preset_name not in QUALITY_PRESETS:
289
+ return (100, 15.0, "euler", 10.0, True, True)
290
+
291
+ preset = QUALITY_PRESETS[preset_name]
292
+ return (
293
+ preset.get("infer_step", 100),
294
+ preset.get("guidance_scale", 15.0),
295
+ preset.get("scheduler_type", "euler"),
296
+ preset.get("omega_scale", 10.0),
297
+ preset.get("use_erg_diffusion", True),
298
+ preset.get("use_erg_tag", True)
299
+ )
300
+
301
+ def create_enhanced_process_func(original_func):
302
+ """κΈ°μ‘΄ ν•¨μˆ˜λ₯Ό ν–₯μƒλœ κΈ°λŠ₯으둜 λž˜ν•‘"""
303
+
304
+ def enhanced_func(
305
+ audio_duration, prompt, lyrics, infer_step, guidance_scale,
306
+ scheduler_type, cfg_type, omega_scale, manual_seeds,
307
+ guidance_interval, guidance_interval_decay, min_guidance_scale,
308
+ use_erg_tag, use_erg_lyric, use_erg_diffusion, oss_steps,
309
+ guidance_scale_text, guidance_scale_lyric,
310
+ audio2audio_enable=False, ref_audio_strength=0.5, ref_audio_input=None,
311
+ lora_name_or_path="none", multi_seed_mode="Single",
312
+ enable_smart_enhancement=True, genre_preset="Custom", song_style="λ“€μ—£ (남녀 ν˜Όμ„±)", **kwargs
313
+ ):
314
+ # 슀마트 ν”„λ‘¬ν”„νŠΈ ν™•μž₯
315
+ if enable_smart_enhancement:
316
+ prompt = enhance_prompt_with_genre(prompt, genre_preset, song_style)
317
+
318
+ # μΊμ‹œ 확인
319
+ cache_params = {
320
+ 'prompt': prompt, 'lyrics': lyrics, 'audio_duration': audio_duration,
321
+ 'infer_step': infer_step, 'guidance_scale': guidance_scale
322
+ }
323
+
324
+ cached_result = generation_cache.get_cached_result(cache_params)
325
+ if cached_result:
326
+ return cached_result
327
+
328
+ # 닀쀑 μ‹œλ“œ 생성
329
+ num_candidates = MULTI_SEED_OPTIONS.get(multi_seed_mode, 1)
330
+
331
+ if num_candidates == 1:
332
+ # κΈ°μ‘΄ ν•¨μˆ˜ 호좜
333
+ result = original_func(
334
+ audio_duration, prompt, lyrics, infer_step, guidance_scale,
335
+ scheduler_type, cfg_type, omega_scale, manual_seeds,
336
+ guidance_interval, guidance_interval_decay, min_guidance_scale,
337
+ use_erg_tag, use_erg_lyric, use_erg_diffusion, oss_steps,
338
+ guidance_scale_text, guidance_scale_lyric, audio2audio_enable,
339
+ ref_audio_strength, ref_audio_input, lora_name_or_path, **kwargs
340
+ )
341
+ else:
342
+ # 닀쀑 μ‹œλ“œ 생성 및 졜적 선택
343
+ candidates = []
344
+
345
+ for i in range(num_candidates):
346
+ seed = random.randint(1, 10000)
347
+
348
+ try:
349
+ result = original_func(
350
+ audio_duration, prompt, lyrics, infer_step, guidance_scale,
351
+ scheduler_type, cfg_type, omega_scale, str(seed),
352
+ guidance_interval, guidance_interval_decay, min_guidance_scale,
353
+ use_erg_tag, use_erg_lyric, use_erg_diffusion, oss_steps,
354
+ guidance_scale_text, guidance_scale_lyric, audio2audio_enable,
355
+ ref_audio_strength, ref_audio_input, lora_name_or_path, **kwargs
356
+ )
357
+
358
+ if result and len(result) > 0:
359
+ audio_path = result[0] # 첫 번째 κ²°κ³Όκ°€ μ˜€λ””μ˜€ 파일 경둜
360
+ if audio_path and os.path.exists(audio_path):
361
+ quality_score = calculate_quality_score(audio_path)
362
+ candidates.append({
363
+ "result": result,
364
+ "quality_score": quality_score,
365
+ "seed": seed
366
+ })
367
+ except Exception as e:
368
+ print(f"Generation {i+1} failed: {e}")
369
+ continue
370
+
371
+ if candidates:
372
+ # 졜�� ν’ˆμ§ˆ 선택
373
+ best_candidate = max(candidates, key=lambda x: x["quality_score"])
374
+ result = best_candidate["result"]
375
+
376
+ # ν’ˆμ§ˆ 정보 μΆ”κ°€
377
+ if len(result) > 1 and isinstance(result[1], dict):
378
+ result[1]["quality_score"] = best_candidate["quality_score"]
379
+ result[1]["selected_seed"] = best_candidate["seed"]
380
+ result[1]["candidates_count"] = len(candidates)
381
+ else:
382
+ # λͺ¨λ“  생성 μ‹€νŒ¨μ‹œ κΈ°λ³Έ 생성
383
+ result = original_func(
384
+ audio_duration, prompt, lyrics, infer_step, guidance_scale,
385
+ scheduler_type, cfg_type, omega_scale, manual_seeds,
386
+ guidance_interval, guidance_interval_decay, min_guidance_scale,
387
+ use_erg_tag, use_erg_lyric, use_erg_diffusion, oss_steps,
388
+ guidance_scale_text, guidance_scale_lyric, audio2audio_enable,
389
+ ref_audio_strength, ref_audio_input, lora_name_or_path, **kwargs
390
+ )
391
+
392
+ # κ²°κ³Ό μΊμ‹œ
393
+ generation_cache.cache_result(cache_params, result)
394
+ return result
395
+
396
+ return enhanced_func
397
+
398
+ def create_output_ui(task_name="Text2Music"):
399
+ # For many consumer-grade GPU devices, only one batch can be run
400
+ output_audio1 = gr.Audio(type="filepath", label=f"{task_name} Generated Audio 1")
401
+
402
+ with gr.Accordion(f"{task_name} Parameters & Quality Info", open=False):
403
+ input_params_json = gr.JSON(label=f"{task_name} Parameters")
404
+
405
+ # ν’ˆμ§ˆ 정보 ν‘œμ‹œ μΆ”κ°€
406
+ with gr.Row():
407
+ quality_score = gr.Number(label="Quality Score (0-100)", value=0, interactive=False)
408
+ generation_info = gr.Textbox(
409
+ label="Generation Info",
410
+ value="",
411
+ interactive=False,
412
+ max_lines=2
413
+ )
414
+
415
+ outputs = [output_audio1]
416
+ return outputs, input_params_json
417
+
418
+ def dump_func(*args):
419
+ print(args)
420
+ return []
421
+
422
+ def create_text2music_ui(
423
+ gr,
424
+ text2music_process_func,
425
+ sample_data_func=None,
426
+ load_data_func=None,
427
+ ):
428
+ # ν–₯μƒλœ ν”„λ‘œμ„ΈμŠ€ ν•¨μˆ˜ 생성
429
+ enhanced_process_func = create_enhanced_process_func(text2music_process_func)
430
+
431
+ with gr.Row():
432
+ with gr.Column():
433
+ # ν’ˆμ§ˆ 및 μ„±λŠ₯ μ„€μ • μ„Ήμ…˜ μΆ”κ°€
434
+ with gr.Group():
435
+ gr.Markdown("### ⚑ ν’ˆμ§ˆ & μ„±λŠ₯ μ„€μ •")
436
+ with gr.Row():
437
+ quality_preset = gr.Dropdown(
438
+ choices=list(QUALITY_PRESETS.keys()),
439
+ value="Standard",
440
+ label="ν’ˆμ§ˆ 프리셋",
441
+ scale=2
442
+ )
443
+ multi_seed_mode = gr.Dropdown(
444
+ choices=list(MULTI_SEED_OPTIONS.keys()),
445
+ value="Single",
446
+ label="닀쀑 생성 λͺ¨λ“œ",
447
+ scale=2,
448
+ info="μ—¬λŸ¬ 번 μƒμ„±ν•˜μ—¬ 졜고 ν’ˆμ§ˆ 선택"
449
+ )
450
+
451
+ preset_description = gr.Textbox(
452
+ value=QUALITY_PRESETS["Standard"]["description"],
453
+ label="μ„€λͺ…",
454
+ interactive=False,
455
+ max_lines=1
456
+ )
457
+
458
+ with gr.Row(equal_height=True):
459
+ # add markdown, tags and lyrics examples are from ai music generation community
460
+ audio_duration = gr.Slider(
461
+ -1,
462
+ 240.0,
463
+ step=0.00001,
464
+ value=-1,
465
+ label="Audio Duration",
466
+ interactive=True,
467
+ info="-1 means random duration (30 ~ 240).",
468
+ scale=7,
469
+ )
470
+ random_bnt = gr.Button("🎲 Random", variant="secondary", scale=1)
471
+ preview_bnt = gr.Button("🎡 Preview", variant="secondary", scale=2)
472
+
473
+ # audio2audio
474
+ with gr.Row(equal_height=True):
475
+ audio2audio_enable = gr.Checkbox(
476
+ label="Enable Audio2Audio",
477
+ value=False,
478
+ info="Check to enable Audio-to-Audio generation using a reference audio.",
479
+ elem_id="audio2audio_checkbox"
480
+ )
481
+ lora_name_or_path = gr.Dropdown(
482
+ label="Lora Name or Path",
483
+ choices=["ACE-Step/ACE-Step-v1-chinese-rap-LoRA", "none"],
484
+ value="none",
485
+ allow_custom_value=True,
486
+ )
487
+
488
+ ref_audio_input = gr.Audio(
489
+ type="filepath",
490
+ label="Reference Audio (for Audio2Audio)",
491
+ visible=False,
492
+ elem_id="ref_audio_input",
493
+ show_download_button=True
494
+ )
495
+ ref_audio_strength = gr.Slider(
496
+ label="Refer audio strength",
497
+ minimum=0.0,
498
+ maximum=1.0,
499
+ step=0.01,
500
+ value=0.5,
501
+ elem_id="ref_audio_strength",
502
+ visible=False,
503
+ interactive=True,
504
+ )
505
+
506
+ def toggle_ref_audio_visibility(is_checked):
507
+ return (
508
+ gr.update(visible=is_checked, elem_id="ref_audio_input"),
509
+ gr.update(visible=is_checked, elem_id="ref_audio_strength"),
510
+ )
511
+
512
+ audio2audio_enable.change(
513
+ fn=toggle_ref_audio_visibility,
514
+ inputs=[audio2audio_enable],
515
+ outputs=[ref_audio_input, ref_audio_strength],
516
+ )
517
+
518
+ with gr.Column(scale=2):
519
+ with gr.Group():
520
+ gr.Markdown("""### 🎼 슀마트 ν”„λ‘¬ν”„νŠΈ μ‹œμŠ€ν…œ
521
+ <center>μž₯λ₯΄μ™€ μŠ€νƒ€μΌμ„ μ„ νƒν•˜λ©΄ μžλ™μœΌλ‘œ μ΅œμ ν™”λœ νƒœκ·Έκ°€ μΆ”κ°€λ©λ‹ˆλ‹€.</center>""")
522
+
523
+ with gr.Row():
524
+ genre_preset = gr.Dropdown(
525
+ choices=["Custom"] + list(GENRE_PRESETS.keys()),
526
+ value="Custom",
527
+ label="μž₯λ₯΄ 프리셋",
528
+ scale=1,
529
+ )
530
+ song_style = gr.Dropdown(
531
+ choices=list(SONG_STYLES.keys()),
532
+ value="λ“€μ—£ (남녀 ν˜Όμ„±)",
533
+ label="곑 μŠ€νƒ€μΌ",
534
+ scale=1,
535
+ )
536
+ enable_smart_enhancement = gr.Checkbox(
537
+ label="슀마트 ν–₯상",
538
+ value=True,
539
+ info="μžλ™ νƒœκ·Έ μ΅œμ ν™”",
540
+ scale=1
541
+ )
542
+
543
+ prompt = gr.Textbox(
544
+ lines=2,
545
+ label="Tags",
546
+ max_lines=4,
547
+ value=TAG_DEFAULT,
548
+ placeholder="콀마둜 κ΅¬λΆ„λœ νƒœκ·Έλ“€...",
549
+ )
550
+
551
+ with gr.Group():
552
+ gr.Markdown("""### πŸ“ AI μž‘μ‚¬ μ‹œμŠ€ν…œ
553
+ <center>주제λ₯Ό μž…λ ₯ν•˜κ³  'AI μž‘μ‚¬' λ²„νŠΌμ„ ν΄λ¦­ν•˜λ©΄ μžλ™μœΌλ‘œ 가사가 μƒμ„±λ©λ‹ˆλ‹€.</center>""")
554
+
555
+ with gr.Row():
556
+ lyric_prompt = gr.Textbox(
557
+ label="μž‘μ‚¬ 주제",
558
+ placeholder="예: μ²«μ‚¬λž‘μ˜ μ„€λ ˜, μ΄λ³„μ˜ μ•„ν””, 희망찬 내일...",
559
+ scale=3
560
+ )
561
+ generate_lyrics_btn = gr.Button("πŸ€– AI μž‘μ‚¬", variant="secondary", scale=1)
562
+
563
+ lyrics = gr.Textbox(
564
+ lines=9,
565
+ label="Lyrics",
566
+ max_lines=13,
567
+ value=LYRIC_DEFAULT,
568
+ placeholder="가사λ₯Ό μž…λ ₯ν•˜μ„Έμš”. [verse], [chorus] λ“±μ˜ ꡬ쑰 νƒœκ·Έ μ‚¬μš©μ„ ꢌμž₯ν•©λ‹ˆλ‹€."
569
+ )
570
+
571
+ with gr.Accordion("Basic Settings", open=False):
572
+ infer_step = gr.Slider(
573
+ minimum=1,
574
+ maximum=300,
575
+ step=1,
576
+ value=150,
577
+ label="Infer Steps",
578
+ interactive=True,
579
+ )
580
+ guidance_scale = gr.Slider(
581
+ minimum=0.0,
582
+ maximum=30.0,
583
+ step=0.1,
584
+ value=15.0,
585
+ label="Guidance Scale",
586
+ interactive=True,
587
+ info="When guidance_scale_lyric > 1 and guidance_scale_text > 1, the guidance scale will not be applied.",
588
+ )
589
+ guidance_scale_text = gr.Slider(
590
+ minimum=0.0,
591
+ maximum=10.0,
592
+ step=0.1,
593
+ value=0.0,
594
+ label="Guidance Scale Text",
595
+ interactive=True,
596
+ info="Guidance scale for text condition. It can only apply to cfg. set guidance_scale_text=5.0, guidance_scale_lyric=1.5 for start",
597
+ )
598
+ guidance_scale_lyric = gr.Slider(
599
+ minimum=0.0,
600
+ maximum=10.0,
601
+ step=0.1,
602
+ value=0.0,
603
+ label="Guidance Scale Lyric",
604
+ interactive=True,
605
+ )
606
+
607
+ manual_seeds = gr.Textbox(
608
+ label="manual seeds (default None)",
609
+ placeholder="1,2,3,4",
610
+ value=None,
611
+ info="Seed for the generation",
612
+ )
613
+
614
+ with gr.Accordion("Advanced Settings", open=False):
615
+ scheduler_type = gr.Radio(
616
+ ["euler", "heun"],
617
+ value="euler",
618
+ label="Scheduler Type",
619
+ elem_id="scheduler_type",
620
+ info="Scheduler type for the generation. euler is recommended. heun will take more time.",
621
+ )
622
+ cfg_type = gr.Radio(
623
+ ["cfg", "apg", "cfg_star"],
624
+ value="apg",
625
+ label="CFG Type",
626
+ elem_id="cfg_type",
627
+ info="CFG type for the generation. apg is recommended. cfg and cfg_star are almost the same.",
628
+ )
629
+ use_erg_tag = gr.Checkbox(
630
+ label="use ERG for tag",
631
+ value=True,
632
+ info="Use Entropy Rectifying Guidance for tag. It will multiple a temperature to the attention to make a weaker tag condition and make better diversity.",
633
+ )
634
+ use_erg_lyric = gr.Checkbox(
635
+ label="use ERG for lyric",
636
+ value=False,
637
+ info="The same but apply to lyric encoder's attention.",
638
+ )
639
+ use_erg_diffusion = gr.Checkbox(
640
+ label="use ERG for diffusion",
641
+ value=True,
642
+ info="The same but apply to diffusion model's attention.",
643
+ )
644
+
645
+ omega_scale = gr.Slider(
646
+ minimum=-100.0,
647
+ maximum=100.0,
648
+ step=0.1,
649
+ value=10.0,
650
+ label="Granularity Scale",
651
+ interactive=True,
652
+ info="Granularity scale for the generation. Higher values can reduce artifacts",
653
+ )
654
+
655
+ guidance_interval = gr.Slider(
656
+ minimum=0.0,
657
+ maximum=1.0,
658
+ step=0.01,
659
+ value=0.5,
660
+ label="Guidance Interval",
661
+ interactive=True,
662
+ info="Guidance interval for the generation. 0.5 means only apply guidance in the middle steps (0.25 * infer_steps to 0.75 * infer_steps)",
663
+ )
664
+ guidance_interval_decay = gr.Slider(
665
+ minimum=0.0,
666
+ maximum=1.0,
667
+ step=0.01,
668
+ value=0.0,
669
+ label="Guidance Interval Decay",
670
+ interactive=True,
671
+ info="Guidance interval decay for the generation. Guidance scale will decay from guidance_scale to min_guidance_scale in the interval. 0.0 means no decay.",
672
+ )
673
+ min_guidance_scale = gr.Slider(
674
+ minimum=0.0,
675
+ maximum=200.0,
676
+ step=0.1,
677
+ value=3.0,
678
+ label="Min Guidance Scale",
679
+ interactive=True,
680
+ info="Min guidance scale for guidance interval decay's end scale",
681
+ )
682
+ oss_steps = gr.Textbox(
683
+ label="OSS Steps",
684
+ placeholder="16, 29, 52, 96, 129, 158, 172, 183, 189, 200",
685
+ value=None,
686
+ info="Optimal Steps for the generation. But not test well",
687
+ )
688
+
689
+ text2music_bnt = gr.Button("🎡 Generate Music", variant="primary", size="lg")
690
+
691
+ # AI μž‘μ‚¬ λ²„νŠΌ 이벀트
692
+ def generate_ai_lyrics(lyric_prompt, genre_preset, song_style):
693
+ if not lyric_prompt:
694
+ return "μž‘μ‚¬ 주제λ₯Ό μž…λ ₯ν•΄μ£Όμ„Έμš”."
695
+ return generate_lyrics_with_ai(lyric_prompt, genre_preset, song_style)
696
+
697
+ generate_lyrics_btn.click(
698
+ fn=generate_ai_lyrics,
699
+ inputs=[lyric_prompt, genre_preset, song_style],
700
+ outputs=[lyrics]
701
+ )
702
+
703
+ # 랜덀 데이터 생성 ν•¨μˆ˜
704
+ def generate_random_music_data(genre_preset, song_style):
705
+ # 랜덀 μž₯λ₯΄ 선택
706
+ if genre_preset == "Custom":
707
+ genre = random.choice(list(GENRE_PRESETS.keys()))
708
+ else:
709
+ genre = genre_preset
710
+
711
+ # 랜덀 주제 리슀트
712
+ themes = [
713
+ "λ„μ‹œμ˜ λ°€", "μ²«μ‚¬λž‘μ˜ μΆ”μ–΅", "μ—¬λ¦„λ‚ μ˜ ν•΄λ³€", "κ°€μ„μ˜ μ •μ·¨",
714
+ "희망찬 내일", "자유둜운 영혼", "별빛 μ•„λž˜ μΆ€", "청좘의 μ—΄μ •",
715
+ "λΉ„ μ˜€λŠ” λ‚ μ˜ 감성", "κΏˆμ„ ν–₯ν•œ 도전", "이별 ν›„μ˜ μ„±μž₯", "μƒˆλ‘œμš΄ μ‹œμž‘"
716
+ ]
717
+
718
+ # 랜덀 μ„€μ •
719
+ duration = random.choice([30, 60, 90, 120, 180])
720
+ theme = random.choice(themes)
721
+
722
+ # AI둜 가사 생성
723
+ lyrics = generate_lyrics_with_ai(theme, genre, song_style)
724
+
725
+ # νƒœκ·Έ 생성
726
+ tags = GENRE_PRESETS.get(genre, "")
727
+ if song_style in SONG_STYLES:
728
+ tags = f"{tags}, {SONG_STYLES[song_style]}"
729
+
730
+ # 랜덀 νŒŒλΌλ―Έν„° μ„€μ •
731
+ return (
732
+ duration, # audio_duration
733
+ tags, # prompt
734
+ lyrics, # lyrics
735
+ 150, # infer_step
736
+ 15.0, # guidance_scale
737
+ "euler", # scheduler_type
738
+ "apg", # cfg_type
739
+ 10.0, # omega_scale
740
+ str(random.randint(1, 10000)), # manual_seeds
741
+ 0.5, # guidance_interval
742
+ 0.0, # guidance_interval_decay
743
+ 3.0, # min_guidance_scale
744
+ True, # use_erg_tag
745
+ False, # use_erg_lyric
746
+ True, # use_erg_diffusion
747
+ None, # oss_steps
748
+ 0.0, # guidance_scale_text
749
+ 0.0, # guidance_scale_lyric
750
+ False, # audio2audio_enable
751
+ 0.5, # ref_audio_strength
752
+ None, # ref_audio_input
753
+ )
754
+
755
+ # λͺ¨λ“  UI μš”μ†Œκ°€ μ •μ˜λœ ν›„ 이벀트 ν•Έλ“€λŸ¬ μ„€μ •
756
+ genre_preset.change(
757
+ fn=update_tags_from_preset,
758
+ inputs=[genre_preset],
759
+ outputs=[prompt]
760
+ )
761
+
762
+ quality_preset.change(
763
+ fn=lambda x: QUALITY_PRESETS.get(x, {}).get("description", ""),
764
+ inputs=[quality_preset],
765
+ outputs=[preset_description]
766
+ )
767
+
768
+ quality_preset.change(
769
+ fn=update_quality_preset,
770
+ inputs=[quality_preset],
771
+ outputs=[infer_step, guidance_scale, scheduler_type, omega_scale, use_erg_diffusion, use_erg_tag]
772
+ )
773
+
774
+ with gr.Column():
775
+ outputs, input_params_json = create_output_ui()
776
+
777
+ # μ‹€μ‹œκ°„ 프리뷰 κΈ°λŠ₯
778
+ def generate_preview(prompt, lyrics, genre_preset, song_style):
779
+ """10초 프리뷰 생성"""
780
+ preview_params = {
781
+ "audio_duration": 10,
782
+ "infer_step": 50,
783
+ "guidance_scale": 12.0,
784
+ "scheduler_type": "euler",
785
+ "cfg_type": "apg",
786
+ "omega_scale": 5.0,
787
+ }
788
+
789
+ enhanced_prompt = enhance_prompt_with_genre(prompt, genre_preset, song_style)
790
+
791
+ try:
792
+ # μ‹€μ œ κ΅¬ν˜„μ—μ„œλŠ” λΉ λ₯Έ 생성 λͺ¨λ“œ μ‚¬μš©
793
+ result = enhanced_process_func(
794
+ preview_params["audio_duration"],
795
+ enhanced_prompt,
796
+ lyrics[:200], # 가사 μΌλΆ€λ§Œ μ‚¬μš©
797
+ preview_params["infer_step"],
798
+ preview_params["guidance_scale"],
799
+ preview_params["scheduler_type"],
800
+ preview_params["cfg_type"],
801
+ preview_params["omega_scale"],
802
+ None, # manual_seeds
803
+ 0.5, # guidance_interval
804
+ 0.0, # guidance_interval_decay
805
+ 3.0, # min_guidance_scale
806
+ True, # use_erg_tag
807
+ False, # use_erg_lyric
808
+ True, # use_erg_diffusion
809
+ None, # oss_steps
810
+ 0.0, # guidance_scale_text
811
+ 0.0, # guidance_scale_lyric
812
+ multi_seed_mode="Single",
813
+ song_style=song_style
814
+ )
815
+ return result[0] if result else None
816
+ except Exception as e:
817
+ return f"프리뷰 생성 μ‹€νŒ¨: {str(e)}"
818
+
819
+ preview_bnt.click(
820
+ fn=generate_preview,
821
+ inputs=[prompt, lyrics, genre_preset, song_style],
822
+ outputs=[outputs[0]]
823
+ )
824
+
825
+ with gr.Tab("retake"):
826
+ retake_variance = gr.Slider(
827
+ minimum=0.0, maximum=1.0, step=0.01, value=0.2, label="variance"
828
+ )
829
+ retake_seeds = gr.Textbox(
830
+ label="retake seeds (default None)", placeholder="", value=None
831
+ )
832
+ retake_bnt = gr.Button("Retake", variant="primary")
833
+ retake_outputs, retake_input_params_json = create_output_ui("Retake")
834
+
835
+ def retake_process_func(json_data, retake_variance, retake_seeds):
836
+ return enhanced_process_func(
837
+ json_data.get("audio_duration", 30),
838
+ json_data.get("prompt", ""),
839
+ json_data.get("lyrics", ""),
840
+ json_data.get("infer_step", 100),
841
+ json_data.get("guidance_scale", 15.0),
842
+ json_data.get("scheduler_type", "euler"),
843
+ json_data.get("cfg_type", "apg"),
844
+ json_data.get("omega_scale", 10.0),
845
+ retake_seeds,
846
+ json_data.get("guidance_interval", 0.5),
847
+ json_data.get("guidance_interval_decay", 0.0),
848
+ json_data.get("min_guidance_scale", 3.0),
849
+ json_data.get("use_erg_tag", True),
850
+ json_data.get("use_erg_lyric", False),
851
+ json_data.get("use_erg_diffusion", True),
852
+ json_data.get("oss_steps", None),
853
+ json_data.get("guidance_scale_text", 0.0),
854
+ json_data.get("guidance_scale_lyric", 0.0),
855
+ audio2audio_enable=json_data.get("audio2audio_enable", False),
856
+ ref_audio_strength=json_data.get("ref_audio_strength", 0.5),
857
+ ref_audio_input=json_data.get("ref_audio_input", None),
858
+ lora_name_or_path=json_data.get("lora_name_or_path", "none"),
859
+ multi_seed_mode="Best of 3", # retakeλŠ” μžλ™μœΌλ‘œ 닀쀑 생성
860
+ retake_variance=retake_variance,
861
+ task="retake"
862
+ )
863
+
864
+ retake_bnt.click(
865
+ fn=retake_process_func,
866
+ inputs=[
867
+ input_params_json,
868
+ retake_variance,
869
+ retake_seeds,
870
+ ],
871
+ outputs=retake_outputs + [retake_input_params_json],
872
+ )
873
+
874
+ with gr.Tab("repainting"):
875
+ retake_variance = gr.Slider(
876
+ minimum=0.0, maximum=1.0, step=0.01, value=0.2, label="variance"
877
+ )
878
+ retake_seeds = gr.Textbox(
879
+ label="repaint seeds (default None)", placeholder="", value=None
880
+ )
881
+ repaint_start = gr.Slider(
882
+ minimum=0.0,
883
+ maximum=240.0,
884
+ step=0.01,
885
+ value=0.0,
886
+ label="Repaint Start Time",
887
+ interactive=True,
888
+ )
889
+ repaint_end = gr.Slider(
890
+ minimum=0.0,
891
+ maximum=240.0,
892
+ step=0.01,
893
+ value=30.0,
894
+ label="Repaint End Time",
895
+ interactive=True,
896
+ )
897
+ repaint_source = gr.Radio(
898
+ ["text2music", "last_repaint", "upload"],
899
+ value="text2music",
900
+ label="Repaint Source",
901
+ elem_id="repaint_source",
902
+ )
903
+
904
+ repaint_source_audio_upload = gr.Audio(
905
+ label="Upload Audio",
906
+ type="filepath",
907
+ visible=False,
908
+ elem_id="repaint_source_audio_upload",
909
+ show_download_button=True,
910
+ )
911
+ repaint_source.change(
912
+ fn=lambda x: gr.update(
913
+ visible=x == "upload", elem_id="repaint_source_audio_upload"
914
+ ),
915
+ inputs=[repaint_source],
916
+ outputs=[repaint_source_audio_upload],
917
+ )
918
+
919
+ repaint_bnt = gr.Button("Repaint", variant="primary")
920
+ repaint_outputs, repaint_input_params_json = create_output_ui("Repaint")
921
+
922
+ def repaint_process_func(
923
+ text2music_json_data,
924
+ repaint_json_data,
925
+ retake_variance,
926
+ retake_seeds,
927
+ repaint_start,
928
+ repaint_end,
929
+ repaint_source,
930
+ repaint_source_audio_upload,
931
+ prompt,
932
+ lyrics,
933
+ infer_step,
934
+ guidance_scale,
935
+ scheduler_type,
936
+ cfg_type,
937
+ omega_scale,
938
+ manual_seeds,
939
+ guidance_interval,
940
+ guidance_interval_decay,
941
+ min_guidance_scale,
942
+ use_erg_tag,
943
+ use_erg_lyric,
944
+ use_erg_diffusion,
945
+ oss_steps,
946
+ guidance_scale_text,
947
+ guidance_scale_lyric,
948
+ ):
949
+ if repaint_source == "upload":
950
+ src_audio_path = repaint_source_audio_upload
951
+ audio_duration = librosa.get_duration(filename=src_audio_path)
952
+ json_data = {"audio_duration": audio_duration}
953
+ elif repaint_source == "text2music":
954
+ json_data = text2music_json_data
955
+ src_audio_path = json_data["audio_path"]
956
+ elif repaint_source == "last_repaint":
957
+ json_data = repaint_json_data
958
+ src_audio_path = json_data["audio_path"]
959
+
960
+ return enhanced_process_func(
961
+ json_data["audio_duration"],
962
+ prompt,
963
+ lyrics,
964
+ infer_step,
965
+ guidance_scale,
966
+ scheduler_type,
967
+ cfg_type,
968
+ omega_scale,
969
+ manual_seeds,
970
+ guidance_interval,
971
+ guidance_interval_decay,
972
+ min_guidance_scale,
973
+ use_erg_tag,
974
+ use_erg_lyric,
975
+ use_erg_diffusion,
976
+ oss_steps,
977
+ guidance_scale_text,
978
+ guidance_scale_lyric,
979
+ retake_seeds=retake_seeds,
980
+ retake_variance=retake_variance,
981
+ task="repaint",
982
+ repaint_start=repaint_start,
983
+ repaint_end=repaint_end,
984
+ src_audio_path=src_audio_path,
985
+ lora_name_or_path="none"
986
+ )
987
+
988
+ repaint_bnt.click(
989
+ fn=repaint_process_func,
990
+ inputs=[
991
+ input_params_json,
992
+ repaint_input_params_json,
993
+ retake_variance,
994
+ retake_seeds,
995
+ repaint_start,
996
+ repaint_end,
997
+ repaint_source,
998
+ repaint_source_audio_upload,
999
+ prompt,
1000
+ lyrics,
1001
+ infer_step,
1002
+ guidance_scale,
1003
+ scheduler_type,
1004
+ cfg_type,
1005
+ omega_scale,
1006
+ manual_seeds,
1007
+ guidance_interval,
1008
+ guidance_interval_decay,
1009
+ min_guidance_scale,
1010
+ use_erg_tag,
1011
+ use_erg_lyric,
1012
+ use_erg_diffusion,
1013
+ oss_steps,
1014
+ guidance_scale_text,
1015
+ guidance_scale_lyric,
1016
+ ],
1017
+ outputs=repaint_outputs + [repaint_input_params_json],
1018
+ )
1019
+
1020
+ with gr.Tab("edit"):
1021
+ edit_prompt = gr.Textbox(lines=2, label="Edit Tags", max_lines=4)
1022
+ edit_lyrics = gr.Textbox(lines=9, label="Edit Lyrics", max_lines=13)
1023
+ retake_seeds = gr.Textbox(
1024
+ label="edit seeds (default None)", placeholder="", value=None
1025
+ )
1026
+
1027
+ edit_type = gr.Radio(
1028
+ ["only_lyrics", "remix"],
1029
+ value="only_lyrics",
1030
+ label="Edit Type",
1031
+ elem_id="edit_type",
1032
+ info="`only_lyrics` will keep the whole song the same except lyrics difference. Make your diffrence smaller, e.g. one lyrc line change.\nremix can change the song melody and genre",
1033
+ )
1034
+ edit_n_min = gr.Slider(
1035
+ minimum=0.0,
1036
+ maximum=1.0,
1037
+ step=0.01,
1038
+ value=0.6,
1039
+ label="edit_n_min",
1040
+ interactive=True,
1041
+ )
1042
+ edit_n_max = gr.Slider(
1043
+ minimum=0.0,
1044
+ maximum=1.0,
1045
+ step=0.01,
1046
+ value=1.0,
1047
+ label="edit_n_max",
1048
+ interactive=True,
1049
+ )
1050
+
1051
+ def edit_type_change_func(edit_type):
1052
+ if edit_type == "only_lyrics":
1053
+ n_min = 0.6
1054
+ n_max = 1.0
1055
+ elif edit_type == "remix":
1056
+ n_min = 0.2
1057
+ n_max = 0.4
1058
+ return n_min, n_max
1059
+
1060
+ edit_type.change(
1061
+ edit_type_change_func,
1062
+ inputs=[edit_type],
1063
+ outputs=[edit_n_min, edit_n_max],
1064
+ )
1065
+
1066
+ edit_source = gr.Radio(
1067
+ ["text2music", "last_edit", "upload"],
1068
+ value="text2music",
1069
+ label="Edit Source",
1070
+ elem_id="edit_source",
1071
+ )
1072
+ edit_source_audio_upload = gr.Audio(
1073
+ label="Upload Audio",
1074
+ type="filepath",
1075
+ visible=False,
1076
+ elem_id="edit_source_audio_upload",
1077
+ show_download_button=True,
1078
+ )
1079
+ edit_source.change(
1080
+ fn=lambda x: gr.update(
1081
+ visible=x == "upload", elem_id="edit_source_audio_upload"
1082
+ ),
1083
+ inputs=[edit_source],
1084
+ outputs=[edit_source_audio_upload],
1085
+ )
1086
+
1087
+ edit_bnt = gr.Button("Edit", variant="primary")
1088
+ edit_outputs, edit_input_params_json = create_output_ui("Edit")
1089
+
1090
+ def edit_process_func(
1091
+ text2music_json_data,
1092
+ edit_input_params_json,
1093
+ edit_source,
1094
+ edit_source_audio_upload,
1095
+ prompt,
1096
+ lyrics,
1097
+ edit_prompt,
1098
+ edit_lyrics,
1099
+ edit_n_min,
1100
+ edit_n_max,
1101
+ infer_step,
1102
+ guidance_scale,
1103
+ scheduler_type,
1104
+ cfg_type,
1105
+ omega_scale,
1106
+ manual_seeds,
1107
+ guidance_interval,
1108
+ guidance_interval_decay,
1109
+ min_guidance_scale,
1110
+ use_erg_tag,
1111
+ use_erg_lyric,
1112
+ use_erg_diffusion,
1113
+ oss_steps,
1114
+ guidance_scale_text,
1115
+ guidance_scale_lyric,
1116
+ retake_seeds,
1117
+ ):
1118
+ if edit_source == "upload":
1119
+ src_audio_path = edit_source_audio_upload
1120
+ audio_duration = librosa.get_duration(filename=src_audio_path)
1121
+ json_data = {"audio_duration": audio_duration}
1122
+ elif edit_source == "text2music":
1123
+ json_data = text2music_json_data
1124
+ src_audio_path = json_data["audio_path"]
1125
+ elif edit_source == "last_edit":
1126
+ json_data = edit_input_params_json
1127
+ src_audio_path = json_data["audio_path"]
1128
+
1129
+ if not edit_prompt:
1130
+ edit_prompt = prompt
1131
+ if not edit_lyrics:
1132
+ edit_lyrics = lyrics
1133
+
1134
+ return enhanced_process_func(
1135
+ json_data["audio_duration"],
1136
+ prompt,
1137
+ lyrics,
1138
+ infer_step,
1139
+ guidance_scale,
1140
+ scheduler_type,
1141
+ cfg_type,
1142
+ omega_scale,
1143
+ manual_seeds,
1144
+ guidance_interval,
1145
+ guidance_interval_decay,
1146
+ min_guidance_scale,
1147
+ use_erg_tag,
1148
+ use_erg_lyric,
1149
+ use_erg_diffusion,
1150
+ oss_steps,
1151
+ guidance_scale_text,
1152
+ guidance_scale_lyric,
1153
+ task="edit",
1154
+ src_audio_path=src_audio_path,
1155
+ edit_target_prompt=edit_prompt,
1156
+ edit_target_lyrics=edit_lyrics,
1157
+ edit_n_min=edit_n_min,
1158
+ edit_n_max=edit_n_max,
1159
+ retake_seeds=retake_seeds,
1160
+ lora_name_or_path="none"
1161
+ )
1162
+
1163
+ edit_bnt.click(
1164
+ fn=edit_process_func,
1165
+ inputs=[
1166
+ input_params_json,
1167
+ edit_input_params_json,
1168
+ edit_source,
1169
+ edit_source_audio_upload,
1170
+ prompt,
1171
+ lyrics,
1172
+ edit_prompt,
1173
+ edit_lyrics,
1174
+ edit_n_min,
1175
+ edit_n_max,
1176
+ infer_step,
1177
+ guidance_scale,
1178
+ scheduler_type,
1179
+ cfg_type,
1180
+ omega_scale,
1181
+ manual_seeds,
1182
+ guidance_interval,
1183
+ guidance_interval_decay,
1184
+ min_guidance_scale,
1185
+ use_erg_tag,
1186
+ use_erg_lyric,
1187
+ use_erg_diffusion,
1188
+ oss_steps,
1189
+ guidance_scale_text,
1190
+ guidance_scale_lyric,
1191
+ retake_seeds,
1192
+ ],
1193
+ outputs=edit_outputs + [edit_input_params_json],
1194
+ )
1195
+
1196
+ with gr.Tab("extend"):
1197
+ extend_seeds = gr.Textbox(
1198
+ label="extend seeds (default None)", placeholder="", value=None
1199
+ )
1200
+ left_extend_length = gr.Slider(
1201
+ minimum=0.0,
1202
+ maximum=240.0,
1203
+ step=0.01,
1204
+ value=0.0,
1205
+ label="Left Extend Length",
1206
+ interactive=True,
1207
+ )
1208
+ right_extend_length = gr.Slider(
1209
+ minimum=0.0,
1210
+ maximum=240.0,
1211
+ step=0.01,
1212
+ value=30.0,
1213
+ label="Right Extend Length",
1214
+ interactive=True,
1215
+ )
1216
+ extend_source = gr.Radio(
1217
+ ["text2music", "last_extend", "upload"],
1218
+ value="text2music",
1219
+ label="Extend Source",
1220
+ elem_id="extend_source",
1221
+ )
1222
+
1223
+ extend_source_audio_upload = gr.Audio(
1224
+ label="Upload Audio",
1225
+ type="filepath",
1226
+ visible=False,
1227
+ elem_id="extend_source_audio_upload",
1228
+ show_download_button=True,
1229
+ )
1230
+ extend_source.change(
1231
+ fn=lambda x: gr.update(
1232
+ visible=x == "upload", elem_id="extend_source_audio_upload"
1233
+ ),
1234
+ inputs=[extend_source],
1235
+ outputs=[extend_source_audio_upload],
1236
+ )
1237
+
1238
+ extend_bnt = gr.Button("Extend", variant="primary")
1239
+ extend_outputs, extend_input_params_json = create_output_ui("Extend")
1240
+
1241
+ def extend_process_func(
1242
+ text2music_json_data,
1243
+ extend_input_params_json,
1244
+ extend_seeds,
1245
+ left_extend_length,
1246
+ right_extend_length,
1247
+ extend_source,
1248
+ extend_source_audio_upload,
1249
+ prompt,
1250
+ lyrics,
1251
+ infer_step,
1252
+ guidance_scale,
1253
+ scheduler_type,
1254
+ cfg_type,
1255
+ omega_scale,
1256
+ manual_seeds,
1257
+ guidance_interval,
1258
+ guidance_interval_decay,
1259
+ min_guidance_scale,
1260
+ use_erg_tag,
1261
+ use_erg_lyric,
1262
+ use_erg_diffusion,
1263
+ oss_steps,
1264
+ guidance_scale_text,
1265
+ guidance_scale_lyric,
1266
+ ):
1267
+ if extend_source == "upload":
1268
+ src_audio_path = extend_source_audio_upload
1269
+ # get audio duration
1270
+ audio_duration = librosa.get_duration(filename=src_audio_path)
1271
+ json_data = {"audio_duration": audio_duration}
1272
+ elif extend_source == "text2music":
1273
+ json_data = text2music_json_data
1274
+ src_audio_path = json_data["audio_path"]
1275
+ elif extend_source == "last_extend":
1276
+ json_data = extend_input_params_json
1277
+ src_audio_path = json_data["audio_path"]
1278
+
1279
+ repaint_start = -left_extend_length
1280
+ repaint_end = json_data["audio_duration"] + right_extend_length
1281
+ return enhanced_process_func(
1282
+ json_data["audio_duration"],
1283
+ prompt,
1284
+ lyrics,
1285
+ infer_step,
1286
+ guidance_scale,
1287
+ scheduler_type,
1288
+ cfg_type,
1289
+ omega_scale,
1290
+ manual_seeds,
1291
+ guidance_interval,
1292
+ guidance_interval_decay,
1293
+ min_guidance_scale,
1294
+ use_erg_tag,
1295
+ use_erg_lyric,
1296
+ use_erg_diffusion,
1297
+ oss_steps,
1298
+ guidance_scale_text,
1299
+ guidance_scale_lyric,
1300
+ retake_seeds=extend_seeds,
1301
+ retake_variance=1.0,
1302
+ task="extend",
1303
+ repaint_start=repaint_start,
1304
+ repaint_end=repaint_end,
1305
+ src_audio_path=src_audio_path,
1306
+ lora_name_or_path="none"
1307
+ )
1308
+
1309
+ extend_bnt.click(
1310
+ fn=extend_process_func,
1311
+ inputs=[
1312
+ input_params_json,
1313
+ extend_input_params_json,
1314
+ extend_seeds,
1315
+ left_extend_length,
1316
+ right_extend_length,
1317
+ extend_source,
1318
+ extend_source_audio_upload,
1319
+ prompt,
1320
+ lyrics,
1321
+ infer_step,
1322
+ guidance_scale,
1323
+ scheduler_type,
1324
+ cfg_type,
1325
+ omega_scale,
1326
+ manual_seeds,
1327
+ guidance_interval,
1328
+ guidance_interval_decay,
1329
+ min_guidance_scale,
1330
+ use_erg_tag,
1331
+ use_erg_lyric,
1332
+ use_erg_diffusion,
1333
+ oss_steps,
1334
+ guidance_scale_text,
1335
+ guidance_scale_lyric,
1336
+ ],
1337
+ outputs=extend_outputs + [extend_input_params_json],
1338
+ )
1339
+
1340
+ # Random λ²„νŠΌ 이벀트
1341
+ random_bnt.click(
1342
+ fn=generate_random_music_data,
1343
+ inputs=[genre_preset, song_style],
1344
+ outputs=[
1345
+ audio_duration,
1346
+ prompt,
1347
+ lyrics,
1348
+ infer_step,
1349
+ guidance_scale,
1350
+ scheduler_type,
1351
+ cfg_type,
1352
+ omega_scale,
1353
+ manual_seeds,
1354
+ guidance_interval,
1355
+ guidance_interval_decay,
1356
+ min_guidance_scale,
1357
+ use_erg_tag,
1358
+ use_erg_lyric,
1359
+ use_erg_diffusion,
1360
+ oss_steps,
1361
+ guidance_scale_text,
1362
+ guidance_scale_lyric,
1363
+ audio2audio_enable,
1364
+ ref_audio_strength,
1365
+ ref_audio_input,
1366
+ ],
1367
+ )
1368
+
1369
+ # 메인 생성 λ²„νŠΌ 이벀트 (ν–₯μƒλœ ν•¨μˆ˜ μ‚¬μš©)
1370
+ text2music_bnt.click(
1371
+ fn=enhanced_process_func,
1372
+ inputs=[
1373
+ audio_duration,
1374
+ prompt,
1375
+ lyrics,
1376
+ infer_step,
1377
+ guidance_scale,
1378
+ scheduler_type,
1379
+ cfg_type,
1380
+ omega_scale,
1381
+ manual_seeds,
1382
+ guidance_interval,
1383
+ guidance_interval_decay,
1384
+ min_guidance_scale,
1385
+ use_erg_tag,
1386
+ use_erg_lyric,
1387
+ use_erg_diffusion,
1388
+ oss_steps,
1389
+ guidance_scale_text,
1390
+ guidance_scale_lyric,
1391
+ audio2audio_enable,
1392
+ ref_audio_strength,
1393
+ ref_audio_input,
1394
+ lora_name_or_path,
1395
+ multi_seed_mode,
1396
+ enable_smart_enhancement,
1397
+ genre_preset,
1398
+ song_style
1399
+ ],
1400
+ outputs=outputs + [input_params_json],
1401
+ )
1402
+
1403
+
1404
+ def create_main_demo_ui(
1405
+ text2music_process_func=dump_func,
1406
+ sample_data_func=dump_func,
1407
+ load_data_func=dump_func,
1408
+ ):
1409
+ with gr.Blocks(
1410
+ title="ACE-Step Model 1.0 DEMO - Enhanced",
1411
+ theme=gr.themes.Soft(),
1412
+ css="""
1413
+ /* κ·ΈλΌλ””μ–ΈνŠΈ λ°°κ²½ */
1414
+ .gradio-container {
1415
+ max-width: 1200px !important;
1416
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
1417
+ min-height: 100vh;
1418
+ }
1419
+
1420
+ /* 메인 μ»¨ν…Œμ΄λ„ˆ μŠ€νƒ€μΌ */
1421
+ .main-container {
1422
+ background: rgba(255, 255, 255, 0.95);
1423
+ border-radius: 20px;
1424
+ padding: 30px;
1425
+ margin: 20px auto;
1426
+ box-shadow: 0 20px 40px rgba(0, 0, 0, 0.1);
1427
+ }
1428
+
1429
+ /* 헀더 μŠ€νƒ€μΌ */
1430
+ .header-title {
1431
+ background: linear-gradient(45deg, #667eea, #764ba2);
1432
+ -webkit-background-clip: text;
1433
+ -webkit-text-fill-color: transparent;
1434
+ font-size: 3em;
1435
+ font-weight: bold;
1436
+ text-align: center;
1437
+ margin-bottom: 10px;
1438
+ }
1439
+
1440
+ /* λ²„νŠΌ μŠ€νƒ€μΌ */
1441
+ .gr-button-primary {
1442
+ background: linear-gradient(45deg, #667eea, #764ba2) !important;
1443
+ border: none !important;
1444
+ color: white !important;
1445
+ font-weight: bold !important;
1446
+ transition: all 0.3s ease !important;
1447
+ }
1448
+
1449
+ .gr-button-primary:hover {
1450
+ transform: translateY(-2px);
1451
+ box-shadow: 0 10px 20px rgba(102, 126, 234, 0.3);
1452
+ }
1453
+
1454
+ .gr-button-secondary {
1455
+ background: linear-gradient(45deg, #f093fb, #f5576c) !important;
1456
+ border: none !important;
1457
+ color: white !important;
1458
+ transition: all 0.3s ease !important;
1459
+ }
1460
+
1461
+ /* κ·Έλ£Ή μŠ€νƒ€μΌ */
1462
+ .gr-group {
1463
+ background: rgba(255, 255, 255, 0.8) !important;
1464
+ border: 1px solid rgba(102, 126, 234, 0.2) !important;
1465
+ border-radius: 15px !important;
1466
+ padding: 20px !important;
1467
+ margin: 10px 0 !important;
1468
+ backdrop-filter: blur(10px) !important;
1469
+ }
1470
+
1471
+ /* νƒ­ μŠ€νƒ€μΌ */
1472
+ .gr-tab {
1473
+ background: rgba(255, 255, 255, 0.9) !important;
1474
+ border-radius: 10px !important;
1475
+ padding: 15px !important;
1476
+ }
1477
+
1478
+ /* μž…λ ₯ ν•„λ“œ μŠ€νƒ€μΌ */
1479
+ .gr-textbox, .gr-dropdown, .gr-slider {
1480
+ border: 2px solid rgba(102, 126, 234, 0.3) !important;
1481
+ border-radius: 10px !important;
1482
+ transition: all 0.3s ease !important;
1483
+ }
1484
+
1485
+ .gr-textbox:focus, .gr-dropdown:focus {
1486
+ border-color: #667eea !important;
1487
+ box-shadow: 0 0 10px rgba(102, 126, 234, 0.2) !important;
1488
+ }
1489
+
1490
+ /* ν’ˆμ§ˆ 정보 μŠ€νƒ€μΌ */
1491
+ .quality-info {
1492
+ background: linear-gradient(135deg, #f093fb20, #f5576c20);
1493
+ padding: 15px;
1494
+ border-radius: 10px;
1495
+ margin: 10px 0;
1496
+ border: 1px solid rgba(240, 147, 251, 0.3);
1497
+ }
1498
+
1499
+ /* μ• λ‹ˆλ©”μ΄μ…˜ */
1500
+ @keyframes fadeIn {
1501
+ from {
1502
+ opacity: 0;
1503
+ transform: translateY(20px);
1504
+ }
1505
+ to {
1506
+ opacity: 1;
1507
+ transform: translateY(0);
1508
+ }
1509
+ }
1510
+
1511
+ .gr-row, .gr-column {
1512
+ animation: fadeIn 0.5s ease-out;
1513
+ }
1514
+
1515
+ /* μŠ€ν¬λ‘€λ°” μŠ€νƒ€μΌ */
1516
+ ::-webkit-scrollbar {
1517
+ width: 10px;
1518
+ }
1519
+
1520
+ ::-webkit-scrollbar-track {
1521
+ background: rgba(255, 255, 255, 0.1);
1522
+ border-radius: 10px;
1523
+ }
1524
+
1525
+ ::-webkit-scrollbar-thumb {
1526
+ background: linear-gradient(45deg, #667eea, #764ba2);
1527
+ border-radius: 10px;
1528
+ }
1529
+
1530
+ /* λ§ˆν¬λ‹€μš΄ μŠ€νƒ€μΌ */
1531
+ .gr-markdown {
1532
+ color: #4a5568 !important;
1533
+ }
1534
+
1535
+ .gr-markdown h3 {
1536
+ color: #667eea !important;
1537
+ font-weight: 600 !important;
1538
+ margin: 15px 0 !important;
1539
+ }
1540
+ """
1541
+ ) as demo:
1542
+ with gr.Column(elem_classes="main-container"):
1543
+ gr.HTML(
1544
+ """
1545
+ <h1 class="header-title">🎡 ACE-Step PRO</h1>
1546
+ <div style="text-align: center; margin: 20px;">
1547
+ <p style="font-size: 1.2em; color: #4a5568;"><strong>πŸš€ μƒˆλ‘œμš΄ κΈ°λŠ₯:</strong> AI μž‘μ‚¬ | ν’ˆμ§ˆ 프리셋 | 닀쀑 생성 | 슀마트 ν”„λ‘¬ν”„νŠΈ | μ‹€μ‹œκ°„ 프리뷰</p>
1548
+ <p style="margin-top: 10px;">
1549
+ <a href="https://ace-step.github.io/" target='_blank' style="color: #667eea; text-decoration: none; margin: 0 10px;">πŸ“„ Project</a> |
1550
+ <a href="https://huggingface.co/ACE-Step/ACE-Step-v1-3.5B" style="color: #667eea; text-decoration: none; margin: 0 10px;">πŸ€— Checkpoints</a> |
1551
+ <a href="https://discord.gg/rjAZz2xBdG" target='_blank' style="color: #667eea; text-decoration: none; margin: 0 10px;">πŸ’¬ Discord</a>
1552
+ </p>
1553
+ </div>
1554
+ """
1555
+ )
1556
+
1557
+ # μ‚¬μš©λ²• κ°€μ΄λ“œ μΆ”κ°€
1558
+ with gr.Accordion("πŸ“– μ‚¬μš©λ²• κ°€μ΄λ“œ", open=False):
1559
+ gr.Markdown("""
1560
+ ### 🎯 λΉ λ₯Έ μ‹œμž‘
1561
+ 1. **μž₯λ₯΄ & μŠ€νƒ€μΌ 선택**: μ›ν•˜λŠ” μŒμ•… μž₯λ₯΄μ™€ 곑 μŠ€νƒ€μΌ(λ“€μ—£, μ†”λ‘œ λ“±)을 μ„ νƒν•©λ‹ˆλ‹€
1562
+ 2. **AI μž‘μ‚¬**: 주제λ₯Ό μž…λ ₯ν•˜κ³  'AI μž‘μ‚¬' λ²„νŠΌμœΌλ‘œ μžλ™ 가사λ₯Ό μƒμ„±ν•©λ‹ˆλ‹€
1563
+ 3. **ν’ˆμ§ˆ μ„€μ •**: Draft(빠름) β†’ Standard(ꢌμž₯) β†’ High Quality β†’ Ultra 쀑 선택
1564
+ 4. **닀쀑 생성**: "Best of 3/5/10" μ„ νƒν•˜λ©΄ μ—¬λŸ¬ 번 μƒμ„±ν•˜μ—¬ 졜고 ν’ˆμ§ˆμ„ μžλ™ μ„ νƒν•©λ‹ˆλ‹€
1565
+ 5. **프리뷰**: 전체 생성 μ „ 10초 ν”„λ¦¬λ·°λ‘œ λΉ λ₯΄κ²Œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€
1566
+
1567
+ ### πŸ’‘ ν’ˆμ§ˆ ν–₯상 팁
1568
+ - **κ³ ν’ˆμ§ˆ 생성**: "High Quality" + "Best of 5" μ‘°ν•© μΆ”μ²œ
1569
+ - **λΉ λ₯Έ ν…ŒμŠ€νŠΈ**: "Draft" + "프리뷰" κΈ°λŠ₯ ν™œμš©
1570
+ - **μž₯λ₯΄ νŠΉν™”**: μž₯λ₯΄ 프리셋 선택 ν›„ "슀마트 ν–₯상" 체크
1571
+ - **가사 ꡬ쑰**: [verse], [chorus], [bridge] νƒœκ·Έ 적극 ν™œμš©
1572
+ - **λ‹€κ΅­μ–΄ 지원**: ν•œκ΅­μ–΄λ‘œ 주제λ₯Ό μž…λ ₯ν•˜λ©΄ ν•œκ΅­μ–΄ 가사가 μƒμ„±λ©λ‹ˆλ‹€
1573
+ """)
1574
+
1575
+ with gr.Tab("🎡 Enhanced Text2Music", elem_classes="gr-tab"):
1576
+ create_text2music_ui(
1577
+ gr=gr,
1578
+ text2music_process_func=text2music_process_func,
1579
+ sample_data_func=sample_data_func,
1580
+ load_data_func=load_data_func,
1581
+ )
1582
+ return demo
1583
+
1584
+
1585
+ if __name__ == "__main__":
1586
+ demo = create_main_demo_ui()
1587
+ demo.launch(
1588
+ server_name="0.0.0.0",
1589
+ server_port=7860,
1590
+ share=True # 곡유 링크 생성
1591
+ )