ginipick commited on
Commit
c750221
Β·
verified Β·
1 Parent(s): 6cb2b1c

Create components.py

Browse files
Files changed (1) hide show
  1. ui/components.py +1447 -0
ui/components.py ADDED
@@ -0,0 +1,1447 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # --- κΈ°μ‘΄ μ½”λ“œ μœ μ§€ (원본) ---
2
+ """
3
+ ACE-Step: A Step Towards Music Generation Foundation Model
4
+
5
+ https://github.com/ace-step/ACE-Step
6
+
7
+ Apache 2.0 License
8
+ """
9
+
10
+ import gradio as gr
11
+ import librosa
12
+ import os
13
+ import random
14
+ import hashlib
15
+ import numpy as np
16
+ import json
17
+ from typing import Dict, List, Tuple, Optional
18
+
19
+ # μΆ”κ°€λœ λΆ€λΆ„: OpenAI λͺ¨λ“ˆ μž„ν¬νŠΈ
20
+ from openai import OpenAI
21
+
22
+ # os.getenv("LLM_API")μ—μ„œ API ν‚€λ₯Ό λΆˆλŸ¬μ™€μ„œ μ‚¬μš©
23
+ client = OpenAI(api_key=os.getenv("LLM_API"))
24
+
25
+ # 가사 생성을 μœ„ν•œ OpenAI 호좜 ν•¨μˆ˜
26
+ def openai_generate_lyrics(topic: str) -> str:
27
+ """
28
+ μ‚¬μš©μžκ°€ μž…λ ₯ν•œ '주제(topic)'λ₯Ό λ°”νƒ•μœΌλ‘œ GPT-4.1-mini λͺ¨λΈμ„ ν˜ΈμΆœν•˜μ—¬
29
+ ν•΄λ‹Ή μ–Έμ–΄(ν•œκ΅­μ–΄/μ˜μ–΄ λ“±) κΈ°μ€€μœΌλ‘œ [verse], [chorus], [bridge] λ“±μ˜
30
+ ꡬ쑰λ₯Ό ν¬ν•¨ν•œ λ…Έλž˜ 가사λ₯Ό μƒμ„±ν•˜μ—¬ λ°˜ν™˜ν•©λ‹ˆλ‹€.
31
+ """
32
+ # μ‹œμŠ€ν…œ ν”„λ‘¬ν”„νŠΈ(λͺ…λ Ή) : λ…Έλž˜ 가사λ₯Ό μž‘μ‚¬ν•˜λŠ” μ „λ¬Έκ°€ μ—­ν• 
33
+ system_prompt = (
34
+ "λ„ˆλŠ” λ…Έλž˜ 가사λ₯Ό μž‘μ‚¬ν•˜λŠ” μ „λ¬Έκ°€ 역할이닀. μ΄μš©μžκ°€ μž…λ ₯ν•˜λŠ” μ£Όμ œμ— 따라 "
35
+ "이에 κ΄€λ ¨λœ λ…Έλž˜ 가사λ₯Ό μž‘μ„±ν•˜λΌ. κ°€μ‚¬μ˜ κ·œμΉ™μ€ \"[ ]\"둜 κ΅¬λΆ„ν•˜μ—¬, "
36
+ "λ‹€μŒ μ˜ˆμ‹œλ₯Ό μ°Έμ‘°ν•˜λΌ. \"\"\"[verse]\nNeon lights they flicker bright\nCity "
37
+ "hums in dead of night\nRhythms pulse through concrete veins\nLost in echoes "
38
+ "of refrains\n\n[verse]\nBassline groovin' in my chest\nHeartbeats match the "
39
+ "city's zest\nElectric whispers fill the air\nSynthesized dreams everywhere\n\n"
40
+ "[chorus]\nTurn it up and let it flow\nFeel the fire let it grow\nIn this rhythm "
41
+ "we belong\nHear the night sing out our song\n\n[verse]\nGuitar strings they "
42
+ "start to weep\nWake the soul from silent sleep\nEvery note a story told\nIn "
43
+ "this night we're bold and gold\n\n[bridge]\nVoices blend in harmony\nLost in "
44
+ "pure cacophony\nTimeless echoes timeless cries\nSoulful shouts beneath the "
45
+ "skies\n\n[verse]\nKeyboard dances on the keys\nMelodies on evening breeze\nCatch "
46
+ "the tune and hold it tight\nIn this moment we take flight\n\"\"\""
47
+ )
48
+ try:
49
+ # GPT-4.1-mini λͺ¨λΈμ— system / user λ©”μ‹œμ§€ ν˜•μ‹μœΌλ‘œ 호좜
50
+ response = client.responses.create(
51
+ model="gpt-4.1-mini",
52
+ input=[
53
+ {
54
+ "role": "system",
55
+ "content": [
56
+ {
57
+ "type": "input_text",
58
+ "text": system_prompt
59
+ }
60
+ ]
61
+ },
62
+ {
63
+ "role": "user",
64
+ "content": [
65
+ {
66
+ "type": "input_text",
67
+ "text": topic
68
+ }
69
+ ]
70
+ }
71
+ ],
72
+ text={"format": {"type": "text"}},
73
+ reasoning={},
74
+ tools=[],
75
+ temperature=1,
76
+ max_output_tokens=2048,
77
+ top_p=1,
78
+ store=True
79
+ )
80
+ # response ꡬ쑰상 λ§ˆμ§€λ§‰ assistant 응닡이 input 리슀트의 λ§ˆμ§€λ§‰ μ›μ†Œλ‘œ λ“€μ–΄μ˜΄
81
+ # role이 assistant인 ν•­λͺ©μ„ μ°Ύμ•„μ„œ 'text' ν•„λ“œλ₯Ό μΆ”μΆœ
82
+ if "input" in response and len(response["input"]) > 0:
83
+ # κ°€μž₯ λ§ˆμ§€λ§‰ λ©”μ‹œμ§€κ°€ assistant일 것이라 κ°€μ •
84
+ last_msg = response["input"][-1]
85
+ if last_msg.get("role") == "assistant":
86
+ # content 리슀트 쀑 첫 λ²ˆμ§Έμ— μš°λ¦¬κ°€ μ›ν•˜λŠ” 가사가 λ“€μ–΄μžˆμ„ 것
87
+ content = last_msg.get("content", [])
88
+ if len(content) > 0 and content[0].get("type") == "output_text":
89
+ return content[0].get("text", "")
90
+ return "가사 생성 μ‹€νŒ¨: 응닡 ν˜•μ‹μ„ νŒŒμ•…ν•  수 μ—†μŠ΅λ‹ˆλ‹€."
91
+ except Exception as e:
92
+ return f"가사 생성 μ‹€νŒ¨: {str(e)}"
93
+
94
+ TAG_DEFAULT = "funk, pop, soul, rock, melodic, guitar, drums, bass, keyboard, percussion, 105 BPM, energetic, upbeat, groovy, vibrant, dynamic"
95
+ LYRIC_DEFAULT = """[verse]
96
+ Neon lights they flicker bright
97
+ City hums in dead of night
98
+ Rhythms pulse through concrete veins
99
+ Lost in echoes of refrains
100
+
101
+ [verse]
102
+ Bassline groovin' in my chest
103
+ Heartbeats match the city's zest
104
+ Electric whispers fill the air
105
+ Synthesized dreams everywhere
106
+
107
+ [chorus]
108
+ Turn it up and let it flow
109
+ Feel the fire let it grow
110
+ In this rhythm we belong
111
+ Hear the night sing out our song
112
+
113
+ [verse]
114
+ Guitar strings they start to weep
115
+ Wake the soul from silent sleep
116
+ Every note a story told
117
+ In this night we're bold and gold
118
+
119
+ [bridge]
120
+ Voices blend in harmony
121
+ Lost in pure cacophony
122
+ Timeless echoes timeless cries
123
+ Soulful shouts beneath the skies
124
+
125
+ [verse]
126
+ Keyboard dances on the keys
127
+ Melodies on evening breeze
128
+ Catch the tune and hold it tight
129
+ In this moment we take flight
130
+ """
131
+
132
+ # ν™•μž₯된 μž₯λ₯΄ 프리셋 (κΈ°μ‘΄ + 개���된 νƒœκ·Έ)
133
+ GENRE_PRESETS = {
134
+ "Modern Pop": "pop, synth, drums, guitar, 120 bpm, upbeat, catchy, vibrant, female vocals, polished vocals, radio-ready, commercial, layered vocals",
135
+ "Rock": "rock, electric guitar, drums, bass, 130 bpm, energetic, rebellious, gritty, male vocals, raw vocals, power chords, driving rhythm",
136
+ "Hip Hop": "hip hop, 808 bass, hi-hats, synth, 90 bpm, bold, urban, intense, male vocals, rhythmic vocals, trap beats, punchy drums",
137
+ "Country": "country, acoustic guitar, steel guitar, fiddle, 100 bpm, heartfelt, rustic, warm, male vocals, twangy vocals, storytelling, americana",
138
+ "EDM": "edm, synth, bass, kick drum, 128 bpm, euphoric, pulsating, energetic, instrumental, progressive build, festival anthem, electronic",
139
+ "Reggae": "reggae, guitar, bass, drums, 80 bpm, chill, soulful, positive, male vocals, smooth vocals, offbeat rhythm, island vibes",
140
+ "Classical": "classical, orchestral, strings, piano, 60 bpm, elegant, emotive, timeless, instrumental, dynamic range, sophisticated harmony",
141
+ "Jazz": "jazz, saxophone, piano, double bass, 110 bpm, smooth, improvisational, soulful, male vocals, crooning vocals, swing feel, sophisticated",
142
+ "Metal": "metal, electric guitar, double kick drum, bass, 160 bpm, aggressive, intense, heavy, male vocals, screamed vocals, distorted, powerful",
143
+ "R&B": "r&b, synth, bass, drums, 85 bpm, sultry, groovy, romantic, female vocals, silky vocals, smooth production, neo-soul"
144
+ }
145
+
146
+ # ν’ˆμ§ˆ 프리셋 μ‹œμŠ€ν…œ μΆ”κ°€
147
+ QUALITY_PRESETS = {
148
+ "Draft (Fast)": {
149
+ "infer_step": 50,
150
+ "guidance_scale": 10.0,
151
+ "scheduler_type": "euler",
152
+ "omega_scale": 5.0,
153
+ "use_erg_diffusion": False,
154
+ "use_erg_tag": True,
155
+ "description": "λΉ λ₯Έ μ΄ˆμ•ˆ 생성 (1-2λΆ„)"
156
+ },
157
+ "Standard": {
158
+ "infer_step": 150,
159
+ "guidance_scale": 15.0,
160
+ "scheduler_type": "euler",
161
+ "omega_scale": 10.0,
162
+ "use_erg_diffusion": True,
163
+ "use_erg_tag": True,
164
+ "description": "ν‘œμ€€ ν’ˆμ§ˆ (3-5λΆ„)"
165
+ },
166
+ "High Quality": {
167
+ "infer_step": 200,
168
+ "guidance_scale": 18.0,
169
+ "scheduler_type": "heun",
170
+ "omega_scale": 15.0,
171
+ "use_erg_diffusion": True,
172
+ "use_erg_tag": True,
173
+ "description": "κ³ ν’ˆμ§ˆ 생성 (8-12λΆ„)"
174
+ },
175
+ "Ultra (Best)": {
176
+ "infer_step": 299,
177
+ "guidance_scale": 20.0,
178
+ "scheduler_type": "heun",
179
+ "omega_scale": 20.0,
180
+ "use_erg_diffusion": True,
181
+ "use_erg_tag": True,
182
+ "description": "졜고 ν’ˆμ§ˆ (15-20λΆ„)"
183
+ }
184
+ }
185
+
186
+ # 닀쀑 μ‹œλ“œ 생성 μ„€μ •
187
+ MULTI_SEED_OPTIONS = {
188
+ "Single": 1,
189
+ "Best of 3": 3,
190
+ "Best of 5": 5,
191
+ "Best of 10": 10
192
+ }
193
+
194
+ class MusicGenerationCache:
195
+ """생성 κ²°κ³Ό 캐싱 μ‹œμŠ€ν…œ"""
196
+ def __init__(self):
197
+ self.cache = {}
198
+ self.max_cache_size = 50
199
+
200
+ def get_cache_key(self, params):
201
+ # μ€‘μš”ν•œ νŒŒλΌλ―Έν„°λ§ŒμœΌλ‘œ ν•΄μ‹œ 생성
202
+ key_params = {k: v for k, v in params.items()
203
+ if k in ['prompt', 'lyrics', 'infer_step', 'guidance_scale', 'audio_duration']}
204
+ return hashlib.md5(str(sorted(key_params.items())).encode()).hexdigest()[:16]
205
+
206
+ def get_cached_result(self, params):
207
+ key = self.get_cache_key(params)
208
+ return self.cache.get(key)
209
+
210
+ def cache_result(self, params, result):
211
+ if len(self.cache) >= self.max_cache_size:
212
+ oldest_key = next(iter(self.cache))
213
+ del self.cache[oldest_key]
214
+
215
+ key = self.get_cache_key(params)
216
+ self.cache[key] = result
217
+
218
+ # μ „μ—­ μΊμ‹œ μΈμŠ€ν„΄μŠ€
219
+ generation_cache = MusicGenerationCache()
220
+
221
+ def enhance_prompt_with_genre(base_prompt: str, genre: str) -> str:
222
+ """μž₯λ₯΄μ— λ”°λ₯Έ 슀마트 ν”„λ‘¬ν”„νŠΈ ν™•μž₯"""
223
+ if genre == "Custom" or not genre:
224
+ return base_prompt
225
+
226
+ # μž₯λ₯΄λ³„ μΆ”κ°€ κ°œμ„  νƒœκ·Έ
227
+ genre_enhancements = {
228
+ "Modern Pop": ["polished production", "mainstream appeal", "hook-driven"],
229
+ "Rock": ["guitar-driven", "powerful drums", "energetic performance"],
230
+ "Hip Hop": ["rhythmic flow", "urban atmosphere", "bass-heavy"],
231
+ "Country": ["acoustic warmth", "storytelling melody", "authentic feel"],
232
+ "EDM": ["electronic atmosphere", "build-ups", "dance-friendly"],
233
+ "Reggae": ["laid-back groove", "tropical vibes", "rhythmic guitar"],
234
+ "Classical": ["orchestral depth", "musical sophistication", "timeless beauty"],
235
+ "Jazz": ["musical complexity", "improvisational spirit", "sophisticated harmony"],
236
+ "Metal": ["aggressive energy", "powerful sound", "intense atmosphere"],
237
+ "R&B": ["smooth groove", "soulful expression", "rhythmic sophistication"]
238
+ }
239
+
240
+ if genre in genre_enhancements:
241
+ additional_tags = ", ".join(genre_enhancements[genre])
242
+ return f"{base_prompt}, {additional_tags}"
243
+
244
+ return base_prompt
245
+
246
+ def calculate_quality_score(audio_path: str) -> float:
247
+ """κ°„λ‹¨ν•œ ν’ˆμ§ˆ 점수 계산 (μ‹€μ œ κ΅¬ν˜„μ—μ„œλŠ” 더 λ³΅μž‘ν•œ λ©”νŠΈλ¦­ μ‚¬μš©)"""
248
+ try:
249
+ y, sr = librosa.load(audio_path)
250
+
251
+ # κΈ°λ³Έ ν’ˆμ§ˆ λ©”νŠΈλ¦­
252
+ rms_energy = np.sqrt(np.mean(y**2))
253
+ spectral_centroid = np.mean(librosa.feature.spectral_centroid(y=y, sr=sr))
254
+ zero_crossing_rate = np.mean(librosa.feature.zero_crossing_rate(y))
255
+
256
+ # μ •κ·œν™”λœ 점수 (0-100)
257
+ energy_score = min(rms_energy * 1000, 40) # 0-40점
258
+ spectral_score = min(spectral_centroid / 100, 40) # 0-40점
259
+ clarity_score = min((1 - zero_crossing_rate) * 20, 20) # 0-20점
260
+
261
+ total_score = energy_score + spectral_score + clarity_score
262
+ return round(total_score, 1)
263
+ except:
264
+ return 50.0 # κΈ°λ³Έκ°’
265
+
266
+ def update_tags_from_preset(preset_name):
267
+ if preset_name == "Custom":
268
+ return ""
269
+ return GENRE_PRESETS.get(preset_name, "")
270
+
271
+ def update_quality_preset(preset_name):
272
+ """ν’ˆμ§ˆ 프리셋 적용"""
273
+ if preset_name not in QUALITY_PRESETS:
274
+ return (100, 15.0, "euler", 10.0, True, True)
275
+
276
+ preset = QUALITY_PRESETS[preset_name]
277
+ return (
278
+ preset.get("infer_step", 100),
279
+ preset.get("guidance_scale", 15.0),
280
+ preset.get("scheduler_type", "euler"),
281
+ preset.get("omega_scale", 10.0),
282
+ preset.get("use_erg_diffusion", True),
283
+ preset.get("use_erg_tag", True)
284
+ )
285
+
286
+ def create_enhanced_process_func(original_func):
287
+ """κΈ°μ‘΄ ν•¨μˆ˜λ₯Ό ν–₯μƒλœ κΈ°λŠ₯으둜 λž˜ν•‘"""
288
+
289
+ def enhanced_func(
290
+ audio_duration, prompt, lyrics, infer_step, guidance_scale,
291
+ scheduler_type, cfg_type, omega_scale, manual_seeds,
292
+ guidance_interval, guidance_interval_decay, min_guidance_scale,
293
+ use_erg_tag, use_erg_lyric, use_erg_diffusion, oss_steps,
294
+ guidance_scale_text, guidance_scale_lyric,
295
+ audio2audio_enable=False, ref_audio_strength=0.5, ref_audio_input=None,
296
+ lora_name_or_path="none", multi_seed_mode="Single",
297
+ enable_smart_enhancement=True, genre_preset="Custom", **kwargs
298
+ ):
299
+ # 슀마트 ν”„λ‘¬ν”„νŠΈ ν™•μž₯
300
+ if enable_smart_enhancement and genre_preset != "Custom":
301
+ prompt = enhance_prompt_with_genre(prompt, genre_preset)
302
+
303
+ # μΊμ‹œ 확인
304
+ cache_params = {
305
+ 'prompt': prompt, 'lyrics': lyrics, 'audio_duration': audio_duration,
306
+ 'infer_step': infer_step, 'guidance_scale': guidance_scale
307
+ }
308
+
309
+ cached_result = generation_cache.get_cached_result(cache_params)
310
+ if cached_result:
311
+ return cached_result
312
+
313
+ # 닀쀑 μ‹œλ“œ 생성
314
+ num_candidates = MULTI_SEED_OPTIONS.get(multi_seed_mode, 1)
315
+
316
+ if num_candidates == 1:
317
+ # κΈ°μ‘΄ ν•¨μˆ˜ 호좜
318
+ result = original_func(
319
+ audio_duration, prompt, lyrics, infer_step, guidance_scale,
320
+ scheduler_type, cfg_type, omega_scale, manual_seeds,
321
+ guidance_interval, guidance_interval_decay, min_guidance_scale,
322
+ use_erg_tag, use_erg_lyric, use_erg_diffusion, oss_steps,
323
+ guidance_scale_text, guidance_scale_lyric, audio2audio_enable,
324
+ ref_audio_strength, ref_audio_input, lora_name_or_path, **kwargs
325
+ )
326
+ else:
327
+ # 닀쀑 μ‹œλ“œ 생성 및 졜적 선택
328
+ candidates = []
329
+
330
+ for i in range(num_candidates):
331
+ seed = random.randint(1, 10000)
332
+
333
+ try:
334
+ result = original_func(
335
+ audio_duration, prompt, lyrics, infer_step, guidance_scale,
336
+ scheduler_type, cfg_type, omega_scale, str(seed),
337
+ guidance_interval, guidance_interval_decay, min_guidance_scale,
338
+ use_erg_tag, use_erg_lyric, use_erg_diffusion, oss_steps,
339
+ guidance_scale_text, guidance_scale_lyric, audio2audio_enable,
340
+ ref_audio_strength, ref_audio_input, lora_name_or_path, **kwargs
341
+ )
342
+
343
+ if result and len(result) > 0:
344
+ audio_path = result[0] # 첫 번째 κ²°κ³Όκ°€ μ˜€λ””μ˜€ 파일 경둜
345
+ if audio_path and os.path.exists(audio_path):
346
+ quality_score = calculate_quality_score(audio_path)
347
+ candidates.append({
348
+ "result": result,
349
+ "quality_score": quality_score,
350
+ "seed": seed
351
+ })
352
+ except Exception as e:
353
+ print(f"Generation {i+1} failed: {e}")
354
+ continue
355
+
356
+ if candidates:
357
+ # 졜고 ν’ˆμ§ˆ 선택
358
+ best_candidate = max(candidates, key=lambda x: x["quality_score"])
359
+ result = best_candidate["result"]
360
+
361
+ # ν’ˆμ§ˆ 정보 μΆ”κ°€
362
+ if len(result) > 1 and isinstance(result[1], dict):
363
+ result[1]["quality_score"] = best_candidate["quality_score"]
364
+ result[1]["selected_seed"] = best_candidate["seed"]
365
+ result[1]["candidates_count"] = len(candidates)
366
+ else:
367
+ # λͺ¨λ“  생성 μ‹€νŒ¨μ‹œ κΈ°λ³Έ 생성
368
+ result = original_func(
369
+ audio_duration, prompt, lyrics, infer_step, guidance_scale,
370
+ scheduler_type, cfg_type, omega_scale, manual_seeds,
371
+ guidance_interval, guidance_interval_decay, min_guidance_scale,
372
+ use_erg_tag, use_erg_lyric, use_erg_diffusion, oss_steps,
373
+ guidance_scale_text, guidance_scale_lyric, audio2audio_enable,
374
+ ref_audio_strength, ref_audio_input, lora_name_or_path, **kwargs
375
+ )
376
+
377
+ # κ²°κ³Ό μΊμ‹œ
378
+ generation_cache.cache_result(cache_params, result)
379
+ return result
380
+
381
+ return enhanced_func
382
+
383
+ def create_output_ui(task_name="Text2Music"):
384
+ # For many consumer-grade GPU devices, only one batch can be run
385
+ output_audio1 = gr.Audio(type="filepath", label=f"{task_name} Generated Audio 1")
386
+
387
+ with gr.Accordion(f"{task_name} Parameters & Quality Info", open=False):
388
+ input_params_json = gr.JSON(label=f"{task_name} Parameters")
389
+
390
+ # ν’ˆμ§ˆ 정보 ν‘œμ‹œ μΆ”κ°€
391
+ with gr.Row():
392
+ quality_score = gr.Number(label="Quality Score (0-100)", value=0, interactive=False)
393
+ generation_info = gr.Textbox(
394
+ label="Generation Info",
395
+ value="",
396
+ interactive=False,
397
+ max_lines=2
398
+ )
399
+
400
+ outputs = [output_audio1]
401
+ return outputs, input_params_json
402
+
403
+ def dump_func(*args):
404
+ print(args)
405
+ return []
406
+
407
+ def create_text2music_ui(
408
+ gr,
409
+ text2music_process_func,
410
+ sample_data_func=None,
411
+ load_data_func=None,
412
+ ):
413
+ # ν–₯μƒλœ ν”„λ‘œμ„ΈμŠ€ ν•¨μˆ˜ 생성
414
+ enhanced_process_func = create_enhanced_process_func(text2music_process_func)
415
+
416
+ with gr.Row():
417
+ with gr.Column():
418
+ # ν’ˆμ§ˆ 및 μ„±λŠ₯ μ„€μ • μ„Ήμ…˜ μΆ”κ°€
419
+ with gr.Group():
420
+ gr.Markdown("### ⚑ ν’ˆμ§ˆ & μ„±λŠ₯ μ„€μ •")
421
+ with gr.Row():
422
+ quality_preset = gr.Dropdown(
423
+ choices=list(QUALITY_PRESETS.keys()),
424
+ value="Standard",
425
+ label="ν’ˆμ§ˆ 프리셋",
426
+ scale=2
427
+ )
428
+ multi_seed_mode = gr.Dropdown(
429
+ choices=list(MULTI_SEED_OPTIONS.keys()),
430
+ value="Single",
431
+ label="닀쀑 생성 λͺ¨λ“œ",
432
+ scale=2,
433
+ info="μ—¬λŸ¬ 번 μƒμ„±ν•˜μ—¬ 졜고 ν’ˆμ§ˆ 선택"
434
+ )
435
+
436
+ preset_description = gr.Textbox(
437
+ value=QUALITY_PRESETS["Standard"]["description"],
438
+ label="μ„€λͺ…",
439
+ interactive=False,
440
+ max_lines=1
441
+ )
442
+
443
+ with gr.Row(equal_height=True):
444
+ audio_duration = gr.Slider(
445
+ -1,
446
+ 240.0,
447
+ step=0.00001,
448
+ value=-1,
449
+ label="Audio Duration",
450
+ interactive=True,
451
+ info="-1 means random duration (30 ~ 240).",
452
+ scale=7,
453
+ )
454
+ sample_bnt = gr.Button("Sample", variant="secondary", scale=1)
455
+ preview_bnt = gr.Button("🎡 Preview", variant="secondary", scale=2)
456
+
457
+ # audio2audio
458
+ with gr.Row(equal_height=True):
459
+ audio2audio_enable = gr.Checkbox(
460
+ label="Enable Audio2Audio",
461
+ value=False,
462
+ info="Check to enable Audio-to-Audio generation using a reference audio.",
463
+ elem_id="audio2audio_checkbox"
464
+ )
465
+ lora_name_or_path = gr.Dropdown(
466
+ label="Lora Name or Path",
467
+ choices=["ACE-Step/ACE-Step-v1-chinese-rap-LoRA", "none"],
468
+ value="none",
469
+ allow_custom_value=True,
470
+ )
471
+
472
+ ref_audio_input = gr.Audio(
473
+ type="filepath",
474
+ label="Reference Audio (for Audio2Audio)",
475
+ visible=False,
476
+ elem_id="ref_audio_input",
477
+ show_download_button=True
478
+ )
479
+ ref_audio_strength = gr.Slider(
480
+ label="Refer audio strength",
481
+ minimum=0.0,
482
+ maximum=1.0,
483
+ step=0.01,
484
+ value=0.5,
485
+ elem_id="ref_audio_strength",
486
+ visible=False,
487
+ interactive=True,
488
+ )
489
+
490
+ def toggle_ref_audio_visibility(is_checked):
491
+ return (
492
+ gr.update(visible=is_checked, elem_id="ref_audio_input"),
493
+ gr.update(visible=is_checked, elem_id="ref_audio_strength"),
494
+ )
495
+
496
+ audio2audio_enable.change(
497
+ fn=toggle_ref_audio_visibility,
498
+ inputs=[audio2audio_enable],
499
+ outputs=[ref_audio_input, ref_audio_strength],
500
+ )
501
+
502
+ with gr.Column(scale=2):
503
+ with gr.Group():
504
+ gr.Markdown("""### 🎼 슀마트 ν”„λ‘¬ν”„νŠΈ μ‹œμŠ€ν…œ
505
+ <center>μž₯λ₯΄ 선택 μ‹œ μžλ™μœΌλ‘œ μ΅œμ ν™”λœ νƒœκ·Έκ°€ μΆ”κ°€λ©λ‹ˆλ‹€. 콀마둜 κ΅¬λΆ„ν•˜μ—¬ νƒœκ·Έλ₯Ό μž…λ ₯ν•˜μ„Έμš”.</center>""")
506
+
507
+ with gr.Row():
508
+ genre_preset = gr.Dropdown(
509
+ choices=["Custom"] + list(GENRE_PRESETS.keys()),
510
+ value="Custom",
511
+ label="μž₯λ₯΄ 프리셋",
512
+ scale=1,
513
+ )
514
+ enable_smart_enhancement = gr.Checkbox(
515
+ label="슀마트 ν–₯상",
516
+ value=True,
517
+ info="μžλ™ νƒœκ·Έ μ΅œμ ν™”",
518
+ scale=1
519
+ )
520
+
521
+ prompt = gr.Textbox(
522
+ lines=2,
523
+ label="Tags",
524
+ max_lines=4,
525
+ value=TAG_DEFAULT,
526
+ placeholder="콀마둜 κ΅¬λΆ„λœ νƒœκ·Έλ“€...",
527
+ )
528
+
529
+ with gr.Group():
530
+ gr.Markdown("""### πŸ“ 가사 μž…λ ₯
531
+ <center>ꡬ쑰 νƒœκ·Έ [verse], [chorus], [bridge] μ‚¬μš©μ„ ꢌμž₯ν•©λ‹ˆλ‹€.<br>[instrumental] λ˜λŠ” [inst]λ₯Ό μ‚¬μš©ν•˜λ©΄ 연주곑을 μƒμ„±ν•©λ‹ˆλ‹€.</center>""")
532
+
533
+ # --- μƒˆλ‘œμš΄ UI μš”μ†Œ: 주제 μž…λ ₯ ν›„ 가사 μžλ™ 생성 ---
534
+ with gr.Row():
535
+ topic_for_lyrics = gr.Textbox(
536
+ lines=1,
537
+ label="가사 주제",
538
+ placeholder="예) μ²«μ‚¬λž‘ 이별, 여름 λ°”λ‹€, 가을 λ‚™μ—½ λ“±..."
539
+ )
540
+ generate_lyrics_btn = gr.Button("가사 생성", variant="secondary")
541
+
542
+ # μ‚¬μš©μž 직접 μž…λ ₯ 가사 λ°•μŠ€
543
+ lyrics = gr.Textbox(
544
+ lines=9,
545
+ label="Lyrics",
546
+ max_lines=13,
547
+ value=LYRIC_DEFAULT,
548
+ placeholder="가사λ₯Ό μž…λ ₯ν•˜μ„Έμš”. [verse], [chorus] λ“±μ˜ ꡬ쑰 νƒœκ·Έ μ‚¬μš©μ„ ꢌμž₯ν•©λ‹ˆλ‹€."
549
+ )
550
+
551
+ # OpenAIλ₯Ό 톡해 가사 μžλ™ μƒμ„±ν•˜λŠ” ν•¨μˆ˜
552
+ def generate_lyrics_ui(topic_text):
553
+ # OpenAI 호좜
554
+ generated = openai_generate_lyrics(topic_text)
555
+ return generated
556
+
557
+ # 가사 생성 λ²„νŠΌ 클릭 μ‹œ, lyrics λ°•μŠ€μ— 반영
558
+ generate_lyrics_btn.click(
559
+ fn=generate_lyrics_ui,
560
+ inputs=[topic_for_lyrics],
561
+ outputs=[lyrics]
562
+ )
563
+
564
+ with gr.Accordion("Basic Settings", open=False):
565
+ infer_step = gr.Slider(
566
+ minimum=1,
567
+ maximum=300,
568
+ step=1,
569
+ value=150,
570
+ label="Infer Steps",
571
+ interactive=True,
572
+ )
573
+ guidance_scale = gr.Slider(
574
+ minimum=0.0,
575
+ maximum=30.0,
576
+ step=0.1,
577
+ value=15.0,
578
+ label="Guidance Scale",
579
+ interactive=True,
580
+ info="When guidance_scale_lyric > 1 and guidance_scale_text > 1, the guidance scale will not be applied.",
581
+ )
582
+ guidance_scale_text = gr.Slider(
583
+ minimum=0.0,
584
+ maximum=10.0,
585
+ step=0.1,
586
+ value=0.0,
587
+ label="Guidance Scale Text",
588
+ interactive=True,
589
+ info="Guidance scale for text condition. It can only apply to cfg. set guidance_scale_text=5.0, guidance_scale_lyric=1.5 for start",
590
+ )
591
+ guidance_scale_lyric = gr.Slider(
592
+ minimum=0.0,
593
+ maximum=10.0,
594
+ step=0.1,
595
+ value=0.0,
596
+ label="Guidance Scale Lyric",
597
+ interactive=True,
598
+ )
599
+
600
+ manual_seeds = gr.Textbox(
601
+ label="manual seeds (default None)",
602
+ placeholder="1,2,3,4",
603
+ value=None,
604
+ info="Seed for the generation",
605
+ )
606
+
607
+ with gr.Accordion("Advanced Settings", open=False):
608
+ scheduler_type = gr.Radio(
609
+ ["euler", "heun"],
610
+ value="euler",
611
+ label="Scheduler Type",
612
+ elem_id="scheduler_type",
613
+ info="Scheduler type for the generation. euler is recommended. heun will take more time.",
614
+ )
615
+ cfg_type = gr.Radio(
616
+ ["cfg", "apg", "cfg_star"],
617
+ value="apg",
618
+ label="CFG Type",
619
+ elem_id="cfg_type",
620
+ info="CFG type for the generation. apg is recommended. cfg and cfg_star are almost the same.",
621
+ )
622
+ use_erg_tag = gr.Checkbox(
623
+ label="use ERG for tag",
624
+ value=True,
625
+ info="Use Entropy Rectifying Guidance for tag. It will multiple a temperature to the attention to make a weaker tag condition and make better diversity.",
626
+ )
627
+ use_erg_lyric = gr.Checkbox(
628
+ label="use ERG for lyric",
629
+ value=False,
630
+ info="The same but apply to lyric encoder's attention.",
631
+ )
632
+ use_erg_diffusion = gr.Checkbox(
633
+ label="use ERG for diffusion",
634
+ value=True,
635
+ info="The same but apply to diffusion model's attention.",
636
+ )
637
+
638
+ omega_scale = gr.Slider(
639
+ minimum=-100.0,
640
+ maximum=100.0,
641
+ step=0.1,
642
+ value=10.0,
643
+ label="Granularity Scale",
644
+ interactive=True,
645
+ info="Granularity scale for the generation. Higher values can reduce artifacts",
646
+ )
647
+
648
+ guidance_interval = gr.Slider(
649
+ minimum=0.0,
650
+ maximum=1.0,
651
+ step=0.01,
652
+ value=0.5,
653
+ label="Guidance Interval",
654
+ interactive=True,
655
+ info="Guidance interval for the generation. 0.5 means only apply guidance in the middle steps (0.25 * infer_steps to 0.75 * infer_steps)",
656
+ )
657
+ guidance_interval_decay = gr.Slider(
658
+ minimum=0.0,
659
+ maximum=1.0,
660
+ step=0.01,
661
+ value=0.0,
662
+ label="Guidance Interval Decay",
663
+ interactive=True,
664
+ info="Guidance interval decay for the generation. Guidance scale will decay from guidance_scale to min_guidance_scale in the interval. 0.0 means no decay.",
665
+ )
666
+ min_guidance_scale = gr.Slider(
667
+ minimum=0.0,
668
+ maximum=200.0,
669
+ step=0.1,
670
+ value=3.0,
671
+ label="Min Guidance Scale",
672
+ interactive=True,
673
+ info="Min guidance scale for guidance interval decay's end scale",
674
+ )
675
+ oss_steps = gr.Textbox(
676
+ label="OSS Steps",
677
+ placeholder="16, 29, 52, 96, 129, 158, 172, 183, 189, 200",
678
+ value=None,
679
+ info="Optimal Steps for the generation. But not test well",
680
+ )
681
+
682
+ text2music_bnt = gr.Button("🎡 Generate Music", variant="primary", size="lg")
683
+
684
+ # λͺ¨λ“  UI μš”μ†Œκ°€ μ •μ˜λœ ν›„ 이벀트 ν•Έλ“€λŸ¬ μ„€μ •
685
+ genre_preset.change(
686
+ fn=update_tags_from_preset,
687
+ inputs=[genre_preset],
688
+ outputs=[prompt]
689
+ )
690
+
691
+ quality_preset.change(
692
+ fn=lambda x: QUALITY_PRESETS.get(x, {}).get("description", ""),
693
+ inputs=[quality_preset],
694
+ outputs=[preset_description]
695
+ )
696
+
697
+ quality_preset.change(
698
+ fn=update_quality_preset,
699
+ inputs=[quality_preset],
700
+ outputs=[infer_step, guidance_scale, scheduler_type, omega_scale, use_erg_diffusion, use_erg_tag]
701
+ )
702
+
703
+ with gr.Column():
704
+ outputs, input_params_json = create_output_ui()
705
+
706
+ # μ‹€μ‹œκ°„ 프리뷰 κΈ°λŠ₯
707
+ def generate_preview(prompt, lyrics, genre_preset):
708
+ """10초 프리뷰 생성"""
709
+ preview_params = {
710
+ "audio_duration": 10,
711
+ "infer_step": 50,
712
+ "guidance_scale": 12.0,
713
+ "scheduler_type": "euler",
714
+ "cfg_type": "apg",
715
+ "omega_scale": 5.0,
716
+ }
717
+
718
+ enhanced_prompt = enhance_prompt_with_genre(prompt, genre_preset) if genre_preset != "Custom" else prompt
719
+
720
+ try:
721
+ # μ‹€μ œ κ΅¬ν˜„μ—μ„œλŠ” λΉ λ₯Έ 생성 λͺ¨λ“œ μ‚¬μš©
722
+ result = enhanced_process_func(
723
+ preview_params["audio_duration"],
724
+ enhanced_prompt,
725
+ lyrics[:200], # 가사 μΌλΆ€λ§Œ μ‚¬μš©
726
+ preview_params["infer_step"],
727
+ preview_params["guidance_scale"],
728
+ preview_params["scheduler_type"],
729
+ preview_params["cfg_type"],
730
+ preview_params["omega_scale"],
731
+ None, # manual_seeds
732
+ 0.5, # guidance_interval
733
+ 0.0, # guidance_interval_decay
734
+ 3.0, # min_guidance_scale
735
+ True, # use_erg_tag
736
+ False, # use_erg_lyric
737
+ True, # use_erg_diffusion
738
+ None, # oss_steps
739
+ 0.0, # guidance_scale_text
740
+ 0.0, # guidance_scale_lyric
741
+ multi_seed_mode="Single"
742
+ )
743
+ return result[0] if result else None
744
+ except Exception as e:
745
+ return f"프리뷰 생성 μ‹€νŒ¨: {str(e)}"
746
+
747
+ preview_bnt.click(
748
+ fn=generate_preview,
749
+ inputs=[prompt, lyrics, genre_preset],
750
+ outputs=[outputs[0]]
751
+ )
752
+
753
+ with gr.Tab("retake"):
754
+ retake_variance = gr.Slider(
755
+ minimum=0.0, maximum=1.0, step=0.01, value=0.2, label="variance"
756
+ )
757
+ retake_seeds = gr.Textbox(
758
+ label="retake seeds (default None)", placeholder="", value=None
759
+ )
760
+ retake_bnt = gr.Button("Retake", variant="primary")
761
+ retake_outputs, retake_input_params_json = create_output_ui("Retake")
762
+
763
+ def retake_process_func(json_data, retake_variance, retake_seeds):
764
+ return enhanced_process_func(
765
+ json_data.get("audio_duration", 30),
766
+ json_data.get("prompt", ""),
767
+ json_data.get("lyrics", ""),
768
+ json_data.get("infer_step", 100),
769
+ json_data.get("guidance_scale", 15.0),
770
+ json_data.get("scheduler_type", "euler"),
771
+ json_data.get("cfg_type", "apg"),
772
+ json_data.get("omega_scale", 10.0),
773
+ retake_seeds,
774
+ json_data.get("guidance_interval", 0.5),
775
+ json_data.get("guidance_interval_decay", 0.0),
776
+ json_data.get("min_guidance_scale", 3.0),
777
+ json_data.get("use_erg_tag", True),
778
+ json_data.get("use_erg_lyric", False),
779
+ json_data.get("use_erg_diffusion", True),
780
+ json_data.get("oss_steps", None),
781
+ json_data.get("guidance_scale_text", 0.0),
782
+ json_data.get("guidance_scale_lyric", 0.0),
783
+ audio2audio_enable=json_data.get("audio2audio_enable", False),
784
+ ref_audio_strength=json_data.get("ref_audio_strength", 0.5),
785
+ ref_audio_input=json_data.get("ref_audio_input", None),
786
+ lora_name_or_path=json_data.get("lora_name_or_path", "none"),
787
+ multi_seed_mode="Best of 3", # retakeλŠ” μžλ™μœΌλ‘œ 닀쀑 생성
788
+ retake_variance=retake_variance,
789
+ task="retake"
790
+ )
791
+
792
+ retake_bnt.click(
793
+ fn=retake_process_func,
794
+ inputs=[
795
+ input_params_json,
796
+ retake_variance,
797
+ retake_seeds,
798
+ ],
799
+ outputs=retake_outputs + [retake_input_params_json],
800
+ )
801
+
802
+ with gr.Tab("repainting"):
803
+ retake_variance = gr.Slider(
804
+ minimum=0.0, maximum=1.0, step=0.01, value=0.2, label="variance"
805
+ )
806
+ retake_seeds = gr.Textbox(
807
+ label="repaint seeds (default None)", placeholder="", value=None
808
+ )
809
+ repaint_start = gr.Slider(
810
+ minimum=0.0,
811
+ maximum=240.0,
812
+ step=0.01,
813
+ value=0.0,
814
+ label="Repaint Start Time",
815
+ interactive=True,
816
+ )
817
+ repaint_end = gr.Slider(
818
+ minimum=0.0,
819
+ maximum=240.0,
820
+ step=0.01,
821
+ value=30.0,
822
+ label="Repaint End Time",
823
+ interactive=True,
824
+ )
825
+ repaint_source = gr.Radio(
826
+ ["text2music", "last_repaint", "upload"],
827
+ value="text2music",
828
+ label="Repaint Source",
829
+ elem_id="repaint_source",
830
+ )
831
+
832
+ repaint_source_audio_upload = gr.Audio(
833
+ label="Upload Audio",
834
+ type="filepath",
835
+ visible=False,
836
+ elem_id="repaint_source_audio_upload",
837
+ show_download_button=True,
838
+ )
839
+ repaint_source.change(
840
+ fn=lambda x: gr.update(
841
+ visible=x == "upload", elem_id="repaint_source_audio_upload"
842
+ ),
843
+ inputs=[repaint_source],
844
+ outputs=[repaint_source_audio_upload],
845
+ )
846
+
847
+ repaint_bnt = gr.Button("Repaint", variant="primary")
848
+ repaint_outputs, repaint_input_params_json = create_output_ui("Repaint")
849
+
850
+ def repaint_process_func(
851
+ text2music_json_data,
852
+ repaint_json_data,
853
+ retake_variance,
854
+ retake_seeds,
855
+ repaint_start,
856
+ repaint_end,
857
+ repaint_source,
858
+ repaint_source_audio_upload,
859
+ prompt,
860
+ lyrics,
861
+ infer_step,
862
+ guidance_scale,
863
+ scheduler_type,
864
+ cfg_type,
865
+ omega_scale,
866
+ manual_seeds,
867
+ guidance_interval,
868
+ guidance_interval_decay,
869
+ min_guidance_scale,
870
+ use_erg_tag,
871
+ use_erg_lyric,
872
+ use_erg_diffusion,
873
+ oss_steps,
874
+ guidance_scale_text,
875
+ guidance_scale_lyric,
876
+ ):
877
+ if repaint_source == "upload":
878
+ src_audio_path = repaint_source_audio_upload
879
+ audio_duration = librosa.get_duration(filename=src_audio_path)
880
+ json_data = {"audio_duration": audio_duration}
881
+ elif repaint_source == "text2music":
882
+ json_data = text2music_json_data
883
+ src_audio_path = json_data["audio_path"]
884
+ elif repaint_source == "last_repaint":
885
+ json_data = repaint_json_data
886
+ src_audio_path = json_data["audio_path"]
887
+
888
+ return enhanced_process_func(
889
+ json_data["audio_duration"],
890
+ prompt,
891
+ lyrics,
892
+ infer_step,
893
+ guidance_scale,
894
+ scheduler_type,
895
+ cfg_type,
896
+ omega_scale,
897
+ manual_seeds,
898
+ guidance_interval,
899
+ guidance_interval_decay,
900
+ min_guidance_scale,
901
+ use_erg_tag,
902
+ use_erg_lyric,
903
+ use_erg_diffusion,
904
+ oss_steps,
905
+ guidance_scale_text,
906
+ guidance_scale_lyric,
907
+ retake_seeds=retake_seeds,
908
+ retake_variance=retake_variance,
909
+ task="repaint",
910
+ repaint_start=repaint_start,
911
+ repaint_end=repaint_end,
912
+ src_audio_path=src_audio_path,
913
+ lora_name_or_path="none"
914
+ )
915
+
916
+ repaint_bnt.click(
917
+ fn=repaint_process_func,
918
+ inputs=[
919
+ input_params_json,
920
+ repaint_input_params_json,
921
+ retake_variance,
922
+ retake_seeds,
923
+ repaint_start,
924
+ repaint_end,
925
+ repaint_source,
926
+ repaint_source_audio_upload,
927
+ prompt,
928
+ lyrics,
929
+ infer_step,
930
+ guidance_scale,
931
+ scheduler_type,
932
+ cfg_type,
933
+ omega_scale,
934
+ manual_seeds,
935
+ guidance_interval,
936
+ guidance_interval_decay,
937
+ min_guidance_scale,
938
+ use_erg_tag,
939
+ use_erg_lyric,
940
+ use_erg_diffusion,
941
+ oss_steps,
942
+ guidance_scale_text,
943
+ guidance_scale_lyric,
944
+ ],
945
+ outputs=repaint_outputs + [repaint_input_params_json],
946
+ )
947
+
948
+ with gr.Tab("edit"):
949
+ edit_prompt = gr.Textbox(lines=2, label="Edit Tags", max_lines=4)
950
+ edit_lyrics = gr.Textbox(lines=9, label="Edit Lyrics", max_lines=13)
951
+ retake_seeds = gr.Textbox(
952
+ label="edit seeds (default None)", placeholder="", value=None
953
+ )
954
+
955
+ edit_type = gr.Radio(
956
+ ["only_lyrics", "remix"],
957
+ value="only_lyrics",
958
+ label="Edit Type",
959
+ elem_id="edit_type",
960
+ info="`only_lyrics` will keep the whole song the same except lyrics difference. Make your diffrence smaller, e.g. one lyrc line change.\nremix can change the song melody and genre",
961
+ )
962
+ edit_n_min = gr.Slider(
963
+ minimum=0.0,
964
+ maximum=1.0,
965
+ step=0.01,
966
+ value=0.6,
967
+ label="edit_n_min",
968
+ interactive=True,
969
+ )
970
+ edit_n_max = gr.Slider(
971
+ minimum=0.0,
972
+ maximum=1.0,
973
+ step=0.01,
974
+ value=1.0,
975
+ label="edit_n_max",
976
+ interactive=True,
977
+ )
978
+
979
+ def edit_type_change_func(edit_type):
980
+ if edit_type == "only_lyrics":
981
+ n_min = 0.6
982
+ n_max = 1.0
983
+ elif edit_type == "remix":
984
+ n_min = 0.2
985
+ n_max = 0.4
986
+ return n_min, n_max
987
+
988
+ edit_type.change(
989
+ edit_type_change_func,
990
+ inputs=[edit_type],
991
+ outputs=[edit_n_min, edit_n_max],
992
+ )
993
+
994
+ edit_source = gr.Radio(
995
+ ["text2music", "last_edit", "upload"],
996
+ value="text2music",
997
+ label="Edit Source",
998
+ elem_id="edit_source",
999
+ )
1000
+ edit_source_audio_upload = gr.Audio(
1001
+ label="Upload Audio",
1002
+ type="filepath",
1003
+ visible=False,
1004
+ elem_id="edit_source_audio_upload",
1005
+ show_download_button=True,
1006
+ )
1007
+ edit_source.change(
1008
+ fn=lambda x: gr.update(
1009
+ visible=x == "upload", elem_id="edit_source_audio_upload"
1010
+ ),
1011
+ inputs=[edit_source],
1012
+ outputs=[edit_source_audio_upload],
1013
+ )
1014
+
1015
+ edit_bnt = gr.Button("Edit", variant="primary")
1016
+ edit_outputs, edit_input_params_json = create_output_ui("Edit")
1017
+
1018
+ def edit_process_func(
1019
+ text2music_json_data,
1020
+ edit_input_params_json,
1021
+ edit_source,
1022
+ edit_source_audio_upload,
1023
+ prompt,
1024
+ lyrics,
1025
+ edit_prompt,
1026
+ edit_lyrics,
1027
+ edit_n_min,
1028
+ edit_n_max,
1029
+ infer_step,
1030
+ guidance_scale,
1031
+ scheduler_type,
1032
+ cfg_type,
1033
+ omega_scale,
1034
+ manual_seeds,
1035
+ guidance_interval,
1036
+ guidance_interval_decay,
1037
+ min_guidance_scale,
1038
+ use_erg_tag,
1039
+ use_erg_lyric,
1040
+ use_erg_diffusion,
1041
+ oss_steps,
1042
+ guidance_scale_text,
1043
+ guidance_scale_lyric,
1044
+ retake_seeds,
1045
+ ):
1046
+ if edit_source == "upload":
1047
+ src_audio_path = edit_source_audio_upload
1048
+ audio_duration = librosa.get_duration(filename=src_audio_path)
1049
+ json_data = {"audio_duration": audio_duration}
1050
+ elif edit_source == "text2music":
1051
+ json_data = text2music_json_data
1052
+ src_audio_path = json_data["audio_path"]
1053
+ elif edit_source == "last_edit":
1054
+ json_data = edit_input_params_json
1055
+ src_audio_path = json_data["audio_path"]
1056
+
1057
+ if not edit_prompt:
1058
+ edit_prompt = prompt
1059
+ if not edit_lyrics:
1060
+ edit_lyrics = lyrics
1061
+
1062
+ return enhanced_process_func(
1063
+ json_data["audio_duration"],
1064
+ prompt,
1065
+ lyrics,
1066
+ infer_step,
1067
+ guidance_scale,
1068
+ scheduler_type,
1069
+ cfg_type,
1070
+ omega_scale,
1071
+ manual_seeds,
1072
+ guidance_interval,
1073
+ guidance_interval_decay,
1074
+ min_guidance_scale,
1075
+ use_erg_tag,
1076
+ use_erg_lyric,
1077
+ use_erg_diffusion,
1078
+ oss_steps,
1079
+ guidance_scale_text,
1080
+ guidance_scale_lyric,
1081
+ task="edit",
1082
+ src_audio_path=src_audio_path,
1083
+ edit_target_prompt=edit_prompt,
1084
+ edit_target_lyrics=edit_lyrics,
1085
+ edit_n_min=edit_n_min,
1086
+ edit_n_max=edit_n_max,
1087
+ retake_seeds=retake_seeds,
1088
+ lora_name_or_path="none"
1089
+ )
1090
+
1091
+ edit_bnt.click(
1092
+ fn=edit_process_func,
1093
+ inputs=[
1094
+ input_params_json,
1095
+ edit_input_params_json,
1096
+ edit_source,
1097
+ edit_source_audio_upload,
1098
+ prompt,
1099
+ lyrics,
1100
+ edit_prompt,
1101
+ edit_lyrics,
1102
+ edit_n_min,
1103
+ edit_n_max,
1104
+ infer_step,
1105
+ guidance_scale,
1106
+ scheduler_type,
1107
+ cfg_type,
1108
+ omega_scale,
1109
+ manual_seeds,
1110
+ guidance_interval,
1111
+ guidance_interval_decay,
1112
+ min_guidance_scale,
1113
+ use_erg_tag,
1114
+ use_erg_lyric,
1115
+ use_erg_diffusion,
1116
+ oss_steps,
1117
+ guidance_scale_text,
1118
+ guidance_scale_lyric,
1119
+ retake_seeds,
1120
+ ],
1121
+ outputs=edit_outputs + [edit_input_params_json],
1122
+ )
1123
+
1124
+ with gr.Tab("extend"):
1125
+ extend_seeds = gr.Textbox(
1126
+ label="extend seeds (default None)", placeholder="", value=None
1127
+ )
1128
+ left_extend_length = gr.Slider(
1129
+ minimum=0.0,
1130
+ maximum=240.0,
1131
+ step=0.01,
1132
+ value=0.0,
1133
+ label="Left Extend Length",
1134
+ interactive=True,
1135
+ )
1136
+ right_extend_length = gr.Slider(
1137
+ minimum=0.0,
1138
+ maximum=240.0,
1139
+ step=0.01,
1140
+ value=30.0,
1141
+ label="Right Extend Length",
1142
+ interactive=True,
1143
+ )
1144
+ extend_source = gr.Radio(
1145
+ ["text2music", "last_extend", "upload"],
1146
+ value="text2music",
1147
+ label="Extend Source",
1148
+ elem_id="extend_source",
1149
+ )
1150
+
1151
+ extend_source_audio_upload = gr.Audio(
1152
+ label="Upload Audio",
1153
+ type="filepath",
1154
+ visible=False,
1155
+ elem_id="extend_source_audio_upload",
1156
+ show_download_button=True,
1157
+ )
1158
+ extend_source.change(
1159
+ fn=lambda x: gr.update(
1160
+ visible=x == "upload", elem_id="extend_source_audio_upload"
1161
+ ),
1162
+ inputs=[extend_source],
1163
+ outputs=[extend_source_audio_upload],
1164
+ )
1165
+
1166
+ extend_bnt = gr.Button("Extend", variant="primary")
1167
+ extend_outputs, extend_input_params_json = create_output_ui("Extend")
1168
+
1169
+ def extend_process_func(
1170
+ text2music_json_data,
1171
+ extend_input_params_json,
1172
+ extend_seeds,
1173
+ left_extend_length,
1174
+ right_extend_length,
1175
+ extend_source,
1176
+ extend_source_audio_upload,
1177
+ prompt,
1178
+ lyrics,
1179
+ infer_step,
1180
+ guidance_scale,
1181
+ scheduler_type,
1182
+ cfg_type,
1183
+ omega_scale,
1184
+ manual_seeds,
1185
+ guidance_interval,
1186
+ guidance_interval_decay,
1187
+ min_guidance_scale,
1188
+ use_erg_tag,
1189
+ use_erg_lyric,
1190
+ use_erg_diffusion,
1191
+ oss_steps,
1192
+ guidance_scale_text,
1193
+ guidance_scale_lyric,
1194
+ ):
1195
+ if extend_source == "upload":
1196
+ src_audio_path = extend_source_audio_upload
1197
+ # get audio duration
1198
+ audio_duration = librosa.get_duration(filename=src_audio_path)
1199
+ json_data = {"audio_duration": audio_duration}
1200
+ elif extend_source == "text2music":
1201
+ json_data = text2music_json_data
1202
+ src_audio_path = json_data["audio_path"]
1203
+ elif extend_source == "last_extend":
1204
+ json_data = extend_input_params_json
1205
+ src_audio_path = json_data["audio_path"]
1206
+
1207
+ repaint_start = -left_extend_length
1208
+ repaint_end = json_data["audio_duration"] + right_extend_length
1209
+ return enhanced_process_func(
1210
+ json_data["audio_duration"],
1211
+ prompt,
1212
+ lyrics,
1213
+ infer_step,
1214
+ guidance_scale,
1215
+ scheduler_type,
1216
+ cfg_type,
1217
+ omega_scale,
1218
+ manual_seeds,
1219
+ guidance_interval,
1220
+ guidance_interval_decay,
1221
+ min_guidance_scale,
1222
+ use_erg_tag,
1223
+ use_erg_lyric,
1224
+ use_erg_diffusion,
1225
+ oss_steps,
1226
+ guidance_scale_text,
1227
+ guidance_scale_lyric,
1228
+ retake_seeds=extend_seeds,
1229
+ retake_variance=1.0,
1230
+ task="extend",
1231
+ repaint_start=repaint_start,
1232
+ repaint_end=repaint_end,
1233
+ src_audio_path=src_audio_path,
1234
+ lora_name_or_path="none"
1235
+ )
1236
+
1237
+ extend_bnt.click(
1238
+ fn=extend_process_func,
1239
+ inputs=[
1240
+ input_params_json,
1241
+ extend_input_params_json,
1242
+ extend_seeds,
1243
+ left_extend_length,
1244
+ right_extend_length,
1245
+ extend_source,
1246
+ extend_source_audio_upload,
1247
+ prompt,
1248
+ lyrics,
1249
+ infer_step,
1250
+ guidance_scale,
1251
+ scheduler_type,
1252
+ cfg_type,
1253
+ omega_scale,
1254
+ manual_seeds,
1255
+ guidance_interval,
1256
+ guidance_interval_decay,
1257
+ min_guidance_scale,
1258
+ use_erg_tag,
1259
+ use_erg_lyric,
1260
+ use_erg_diffusion,
1261
+ oss_steps,
1262
+ guidance_scale_text,
1263
+ guidance_scale_lyric,
1264
+ ],
1265
+ outputs=extend_outputs + [extend_input_params_json],
1266
+ )
1267
+
1268
+ def json2output(json_data):
1269
+ return (
1270
+ json_data["audio_duration"],
1271
+ json_data["prompt"],
1272
+ json_data["lyrics"],
1273
+ json_data["infer_step"],
1274
+ json_data["guidance_scale"],
1275
+ json_data["scheduler_type"],
1276
+ json_data["cfg_type"],
1277
+ json_data["omega_scale"],
1278
+ ", ".join(map(str, json_data["actual_seeds"])),
1279
+ json_data["guidance_interval"],
1280
+ json_data["guidance_interval_decay"],
1281
+ json_data["min_guidance_scale"],
1282
+ json_data["use_erg_tag"],
1283
+ json_data["use_erg_lyric"],
1284
+ json_data["use_erg_diffusion"],
1285
+ ", ".join(map(str, json_data["oss_steps"])),
1286
+ (
1287
+ json_data["guidance_scale_text"]
1288
+ if "guidance_scale_text" in json_data
1289
+ else 0.0
1290
+ ),
1291
+ (
1292
+ json_data["guidance_scale_lyric"]
1293
+ if "guidance_scale_lyric" in json_data
1294
+ else 0.0
1295
+ ),
1296
+ (
1297
+ json_data["audio2audio_enable"]
1298
+ if "audio2audio_enable" in json_data
1299
+ else False
1300
+ ),
1301
+ (
1302
+ json_data["ref_audio_strength"]
1303
+ if "ref_audio_strength" in json_data
1304
+ else 0.5
1305
+ ),
1306
+ (
1307
+ json_data["ref_audio_input"]
1308
+ if "ref_audio_input" in json_data
1309
+ else None
1310
+ ),
1311
+ )
1312
+
1313
+ def sample_data(lora_name_or_path_):
1314
+ if sample_data_func:
1315
+ json_data = sample_data_func(lora_name_or_path_)
1316
+ return json2output(json_data)
1317
+ return {}
1318
+
1319
+ sample_bnt.click(
1320
+ sample_data,
1321
+ inputs=[lora_name_or_path],
1322
+ outputs=[
1323
+ audio_duration,
1324
+ prompt,
1325
+ lyrics,
1326
+ infer_step,
1327
+ guidance_scale,
1328
+ scheduler_type,
1329
+ cfg_type,
1330
+ omega_scale,
1331
+ manual_seeds,
1332
+ guidance_interval,
1333
+ guidance_interval_decay,
1334
+ min_guidance_scale,
1335
+ use_erg_tag,
1336
+ use_erg_lyric,
1337
+ use_erg_diffusion,
1338
+ oss_steps,
1339
+ guidance_scale_text,
1340
+ guidance_scale_lyric,
1341
+ audio2audio_enable,
1342
+ ref_audio_strength,
1343
+ ref_audio_input,
1344
+ ],
1345
+ )
1346
+
1347
+ # 메인 생성 λ²„νŠΌ 이벀트 (ν–₯μƒλœ ν•¨μˆ˜ μ‚¬μš©)
1348
+ text2music_bnt.click(
1349
+ fn=enhanced_process_func,
1350
+ inputs=[
1351
+ audio_duration,
1352
+ prompt,
1353
+ lyrics,
1354
+ infer_step,
1355
+ guidance_scale,
1356
+ scheduler_type,
1357
+ cfg_type,
1358
+ omega_scale,
1359
+ manual_seeds,
1360
+ guidance_interval,
1361
+ guidance_interval_decay,
1362
+ min_guidance_scale,
1363
+ use_erg_tag,
1364
+ use_erg_lyric,
1365
+ use_erg_diffusion,
1366
+ oss_steps,
1367
+ guidance_scale_text,
1368
+ guidance_scale_lyric,
1369
+ audio2audio_enable,
1370
+ ref_audio_strength,
1371
+ ref_audio_input,
1372
+ lora_name_or_path,
1373
+ multi_seed_mode,
1374
+ enable_smart_enhancement,
1375
+ genre_preset
1376
+ ],
1377
+ outputs=outputs + [input_params_json],
1378
+ )
1379
+
1380
+
1381
+ def create_main_demo_ui(
1382
+ text2music_process_func=dump_func,
1383
+ sample_data_func=dump_func,
1384
+ load_data_func=dump_func,
1385
+ ):
1386
+ with gr.Blocks(
1387
+ title="ACE-Step Model 1.0 DEMO - Enhanced",
1388
+ theme=gr.themes.Soft(),
1389
+ css="""
1390
+ .gradio-container {
1391
+ max-width: 1200px !important;
1392
+ }
1393
+ .quality-info {
1394
+ background: linear-gradient(45deg, #f0f8ff, #e6f3ff);
1395
+ padding: 10px;
1396
+ border-radius: 8px;
1397
+ margin: 5px 0;
1398
+ }
1399
+ """
1400
+ ) as demo:
1401
+ gr.Markdown(
1402
+ """
1403
+ <h1 style="text-align: center;">🎡 ACE-Step PRO</h1>
1404
+ <div style="text-align: center; margin: 20px;">
1405
+ <p><strong>πŸš€ μƒˆλ‘œμš΄ κΈ°λŠ₯:</strong> ν’ˆμ§ˆ 프리셋 | 닀쀑 생성 | 슀마트 ν”„λ‘¬ν”„νŠΈ | μ‹€μ‹œκ°„ 프리뷰 | ν’ˆμ§ˆ 점수</p>
1406
+ <p>
1407
+ <a href="https://ace-step.github.io/" target='_blank'>Project</a> |
1408
+ <a href="https://huggingface.co/ACE-Step/ACE-Step-v1-3.5B">Checkpoints</a> |
1409
+ <a href="https://discord.gg/rjAZz2xBdG" target='_blank'>Discord</a>
1410
+ </p>
1411
+ </div>
1412
+ """
1413
+ )
1414
+
1415
+ # μ‚¬μš©λ²• κ°€μ΄λ“œ μΆ”κ°€
1416
+ with gr.Accordion("πŸ“– μ‚¬μš©λ²• κ°€μ΄λ“œ", open=False):
1417
+ gr.Markdown("""
1418
+ ### 🎯 λΉ λ₯Έ μ‹œμž‘
1419
+ 1. **μž₯λ₯΄ 선택**: μ›ν•˜λŠ” μŒμ•… μž₯λ₯΄λ₯Ό μ„ νƒν•˜λ©΄ μžλ™μœΌλ‘œ μ΅œμ ν™”λœ νƒœκ·Έκ°€ μ μš©λ©λ‹ˆλ‹€
1420
+ 2. **ν’ˆμ§ˆ μ„€μ •**: Draft(빠름) β†’ Standard(ꢌμž₯) β†’ High Quality β†’ Ultra 쀑 선택
1421
+ 3. **닀쀑 생성**: "Best of 3/5/10" μ„ νƒν•˜λ©΄ μ—¬λŸ¬ 번 μƒμ„±ν•˜μ—¬ 졜고 ν’ˆμ§ˆμ„ μžλ™ μ„ νƒν•©λ‹ˆλ‹€
1422
+ 4. **프리뷰**: 전체 생성 μ „ 10초 ν”„λ¦¬λ·°λ‘œ λΉ λ₯΄κ²Œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€
1423
+
1424
+ ### πŸ’‘ ν’ˆμ§ˆ ν–₯상 팁
1425
+ - **κ³ ν’ˆμ§ˆ 생성**: "High Quality" + "Best of 5" μ‘°ν•© μΆ”μ²œ
1426
+ - **λΉ λ₯Έ ν…ŒμŠ€νŠΈ**: "Draft" + "프리뷰" κΈ°λŠ₯ ν™œμš©
1427
+ - **μž₯λ₯΄ νŠΉν™”**: μž₯λ₯΄ 프리셋 선택 ν›„ "슀마트 ν–₯상" 체크
1428
+ - **가사 ꡬ쑰**: [verse], [chorus], [bridge] νƒœκ·Έ 적극 ν™œμš©
1429
+ """)
1430
+
1431
+ with gr.Tab("🎡 Enhanced Text2Music"):
1432
+ create_text2music_ui(
1433
+ gr=gr,
1434
+ text2music_process_func=text2music_process_func,
1435
+ sample_data_func=sample_data_func,
1436
+ load_data_func=load_data_func,
1437
+ )
1438
+ return demo
1439
+
1440
+
1441
+ if __name__ == "__main__":
1442
+ demo = create_main_demo_ui()
1443
+ demo.launch(
1444
+ server_name="0.0.0.0",
1445
+ server_port=7860,
1446
+ share=True # 곡유 링크 생성
1447
+ )