import random import os import uuid from datetime import datetime import gradio as gr import numpy as np import spaces import torch from diffusers import DiffusionPipeline from PIL import Image # ----------------------------- # Gemini API & Text Rendering 관련 추가 모듈 # ----------------------------- import re import tempfile import io import logging import base64 import string import requests from google import genai from google.genai import types import numpy as np logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s') def maybe_translate_to_english(text: str) -> str: """ 텍스트에 한글이 포함되어 있으면 간단한 규칙에 따라 영어로 변환. """ if not text or not re.search("[가-힣]", text): return text try: translations = { "안녕하세요": "Hello", "환영합니다": "Welcome", "아름다운 당신": "Beautiful You", "안녕": "Hello", "고양이": "Cat", "배너": "Banner", "썬글라스": "Sunglasses", "착용한": "wearing", "흰색": "white" } for kr, en in translations.items(): if kr in text: text = text.replace(kr, en) print(f"[TRANSLATE] Translated Korean text: '{text}'") return text except Exception as e: print(f"[WARNING] Translation failed: {e}") return text def save_binary_file(file_name, data): with open(file_name, "wb") as f: f.write(data) def generate_by_google_genai(text, file_name, model="gemini-2.0-flash-exp"): """ Gemini API를 호출하여 텍스트 기반 이미지 편집/생성을 수행. """ api_key = os.getenv("GAPI_TOKEN", None) if not api_key: raise ValueError("GAPI_TOKEN is missing. Please set an API key.") client = genai.Client(api_key=api_key) files = [client.files.upload(file=file_name)] contents = [ types.Content( role="user", parts=[ types.Part.from_uri( file_uri=files[0].uri, mime_type=files[0].mime_type, ), types.Part.from_text(text=text), ], ), ] generate_content_config = types.GenerateContentConfig( temperature=1, top_p=0.95, top_k=40, max_output_tokens=8192, response_modalities=["image", "text"], response_mime_type="text/plain", ) text_response = "" image_path = None with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as tmp: temp_path = tmp.name for chunk in client.models.generate_content_stream( model=model, contents=contents, config=generate_content_config, ): if not chunk.candidates or not chunk.candidates[0].content or not chunk.candidates[0].content.parts: continue candidate = chunk.candidates[0].content.parts[0] if candidate.inline_data: save_binary_file(temp_path, candidate.inline_data.data) print(f"File of mime type {candidate.inline_data.mime_type} saved to: {temp_path}") image_path = temp_path break else: text_response += chunk.text + "\n" del files return image_path, text_response def change_text_in_image_two_times(original_image, instruction): # 만약 이미지가 numpy.ndarray 타입이면 PIL Image로 변환 if isinstance(original_image, np.ndarray): original_image = Image.fromarray(original_image) results = [] for version_tag in ["(A)", "(B)"]: mod_instruction = f"{instruction} {version_tag}" try: with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as tmp: original_path = tmp.name if isinstance(original_image, Image.Image): original_image.save(original_path, format="PNG") print(f"[DEBUG] Saved image to temporary file: {original_path}") else: raise gr.Error(f"예상된 PIL Image가 아닌 {type(original_image)} 타입이 제공되었습니다.") # 이후 Gemini API 호출 로직 유지 image_path, text_response = generate_by_google_genai( text=mod_instruction, file_name=original_path ) if image_path: try: with open(image_path, "rb") as f: image_data = f.read() new_img = Image.open(io.BytesIO(image_data)) results.append(new_img) except Exception as img_err: print(f"[ERROR] Failed to process Gemini image: {img_err}") results.append(original_image) else: print(f"[WARNING] 이미지가 반환되지 않았습니다. 텍스트 응답: {text_response}") results.append(original_image) except Exception as e: logging.exception(f"Text modification error: {e}") results.append(original_image) return results def gemini_text_rendering(image, rendering_text): """ 주어진 이미지에 대해 Gemini API를 사용해 텍스트 렌더링을 적용. """ rendering_text_en = maybe_translate_to_english(rendering_text) instruction = f"Render the following text on the image in a clear, visually appealing manner: {rendering_text_en}." rendered_images = change_text_in_image_two_times(image, instruction) if rendered_images and len(rendered_images) > 0: return rendered_images[0] return image def apply_text_rendering(image, rendering_text): """ 입력된 텍스트가 있으면 Gemini API를 통해 이미지에 텍스트 렌더링을 적용하고, 없으면 원본 이미지를 그대로 반환. """ if rendering_text and rendering_text.strip(): return gemini_text_rendering(image, rendering_text) return image # ----------------------------- # 기존 Diffusion Pipeline 관련 코드 # ----------------------------- import gradio_client.utils import types original_json_schema = gradio_client.utils._json_schema_to_python_type def patched_json_schema(schema, defs=None): if isinstance(schema, bool): return "bool" try: if "additionalProperties" in schema and isinstance(schema["additionalProperties"], bool): schema["additionalProperties"] = {"type": "any"} except (TypeError, KeyError): pass try: return original_json_schema(schema, defs) except Exception as e: return "any" gradio_client.utils._json_schema_to_python_type = patched_json_schema SAVE_DIR = "saved_images" if not os.path.exists(SAVE_DIR): os.makedirs(SAVE_DIR, exist_ok=True) device = "cuda" if torch.cuda.is_available() else "cpu" repo_id = "black-forest-labs/FLUX.1-dev" adapter_id = "openfree/flux-chatgpt-ghibli-lora" def load_model_with_retry(max_retries=5): for attempt in range(max_retries): try: print(f"Loading model attempt {attempt+1}/{max_retries}...") pipeline = DiffusionPipeline.from_pretrained( repo_id, torch_dtype=torch.bfloat16, use_safetensors=True, resume_download=True ) print("Model loaded successfully, loading LoRA weights...") pipeline.load_lora_weights(adapter_id) pipeline = pipeline.to(device) print("Pipeline ready!") return pipeline except Exception as e: if attempt < max_retries - 1: wait_time = 10 * (attempt + 1) print(f"Error loading model: {e}. Retrying in {wait_time} seconds...") import time time.sleep(wait_time) else: raise Exception(f"Failed to load model after {max_retries} attempts: {e}") pipeline = load_model_with_retry() MAX_SEED = np.iinfo(np.int32).max MAX_IMAGE_SIZE = 1024 def save_generated_image(image, prompt): timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") unique_id = str(uuid.uuid4())[:8] filename = f"{timestamp}_{unique_id}.png" filepath = os.path.join(SAVE_DIR, filename) image.save(filepath) metadata_file = os.path.join(SAVE_DIR, "metadata.txt") with open(metadata_file, "a", encoding="utf-8") as f: f.write(f"{filename}|{prompt}|{timestamp}\n") return filepath def load_generated_images(): if not os.path.exists(SAVE_DIR): return [] image_files = [os.path.join(SAVE_DIR, f) for f in os.listdir(SAVE_DIR) if f.endswith(('.png', '.jpg', '.jpeg', '.webp'))] image_files.sort(key=lambda x: os.path.getctime(x), reverse=True) return image_files @spaces.GPU(duration=120) def inference( prompt: str, seed: int, randomize_seed: bool, width: int, height: int, guidance_scale: float, num_inference_steps: int, lora_scale: float, progress: gr.Progress = gr.Progress(track_tqdm=True), ): if randomize_seed: seed = random.randint(0, MAX_SEED) generator = torch.Generator(device=device).manual_seed(seed) try: image = pipeline( prompt=prompt, guidance_scale=guidance_scale, num_inference_steps=num_inference_steps, width=width, height=height, generator=generator, joint_attention_kwargs={"scale": lora_scale}, ).images[0] filepath = save_generated_image(image, prompt) return image, seed, load_generated_images() except Exception as e: print(f"Error during inference: {e}") error_img = Image.new('RGB', (width, height), color='red') return error_img, seed, load_generated_images() # ----------------------------- # Gradio UI (입력 프롬프트 아래에 Text Rendering 입력란 추가) # ----------------------------- examples = [ "Ghibli style futuristic stormtrooper with glossy white armor and a sleek helmet, standing heroically on a lush alien planet, vibrant flowers blooming around, soft sunlight illuminating the scene, a gentle breeze rustling the leaves. The armor reflects the pink and purple hues of the alien sunset, creating an ethereal glow around the figure. [trigger]", "Ghibli style young mechanic girl in a floating workshop, surrounded by hovering tools and glowing mechanical parts, her blue overalls covered in oil stains, tinkering with a semi-transparent robot companion. Magical sparks fly as she works, while floating islands with waterfalls drift past her open workshop window. [trigger]", "Ghibli style ancient forest guardian robot, covered in moss and flowering vines, sitting peacefully in a crystal-clear lake. Its gentle eyes glow with soft blue light, while bioluminescent dragonflies dance around its weathered metal frame. Ancient tech symbols on its surface pulse with a gentle rhythm. [trigger]", "Ghibli style sky whale transport ship, its metallic skin adorned with traditional Japanese patterns, gliding through cotton candy clouds at sunrise. Small floating gardens hang from its sides, where workers in futuristic kimonos tend to glowing plants. Rainbow auroras shimmer in the background. [trigger]", "Ghibli style cyber-shrine maiden with flowing holographic robes, performing a ritual dance among floating lanterns and digital cherry blossoms. Her traditional headdress emits soft light patterns, while spirit-like AI constructs swirl around her in elegant patterns. The scene is set in a modern shrine with both ancient wood and sleek chrome elements. [trigger]", "Ghibli style robot farmer tending to floating rice paddies in the sky, wearing a traditional straw hat with advanced sensors. Its gentle movements create ripples in the water as it plants glowing rice seedlings. Flying fish leap between the terraced fields, leaving trails of sparkles in their wake, while future Tokyo's spires gleam in the distance. [trigger]" ] css = """ :root { --primary-color: #6a92cc; --primary-hover: #557ab8; --secondary-color: #f4c062; --background-color: #f7f9fc; --panel-background: #ffffff; --text-color: #333333; --border-radius: 12px; --shadow: 0 4px 12px rgba(0,0,0,0.08); --font-main: 'Poppins', -apple-system, BlinkMacSystemFont, sans-serif; } body { background-color: var(--background-color); font-family: var(--font-main); } .gradio-container { margin: 0 auto; max-width: 1200px !important; } .main-header { text-align: center; padding: 2rem 1rem 1rem; background: linear-gradient(90deg, #6a92cc 0%, #8f7fc8 100%); color: white; margin-bottom: 2rem; border-radius: var(--border-radius); box-shadow: var(--shadow); } .main-header h1 { font-size: 2.5rem; margin-bottom: 0.5rem; font-weight: 700; text-shadow: 0 2px 4px rgba(0,0,0,0.2); } .main-header p { font-size: 1rem; margin-bottom: 0.5rem; opacity: 0.9; } .main-header a { color: var(--secondary-color); text-decoration: none; font-weight: 600; transition: all 0.2s ease; } .main-header a:hover { text-decoration: underline; opacity: 0.9; } .container { background-color: var(--panel-background); padding: 1.5rem; border-radius: var(--border-radius); box-shadow: var(--shadow); margin-bottom: 1.5rem; } button.primary { background: var(--primary-color) !important; border: none !important; color: white !important; padding: 10px 20px !important; border-radius: 8px !important; font-weight: 600 !important; box-shadow: 0 2px 5px rgba(0,0,0,0.1) !important; transition: all 0.2s ease !important; } button.primary:hover { background: var(--primary-hover) !important; transform: translateY(-2px) !important; box-shadow: 0 4px 8px rgba(0,0,0,0.15) !important; } button.secondary { background: white !important; border: 1px solid #ddd !important; color: var(--text-color) !important; padding: 10px 20px !important; border-radius: 8px !important; font-weight: 500 !important; box-shadow: 0 2px 5px rgba(0,0,0,0.05) !important; transition: all 0.2s ease !important; } button.secondary:hover { background: #f5f5f5 !important; transform: translateY(-2px) !important; } .gr-box { border-radius: var(--border-radius) !important; border: 1px solid #e0e0e0 !important; } .gr-panel { border-radius: var(--border-radius) !important; } .gr-input { border-radius: 8px !important; border: 1px solid #ddd !important; padding: 12px !important; } .gr-form { border-radius: var(--border-radius) !important; background-color: var(--panel-background) !important; } .gr-accordion { border-radius: var(--border-radius) !important; overflow: hidden !important; } .gr-button { border-radius: 8px !important; } .gallery-item { border-radius: var(--border-radius) !important; transition: all 0.3s ease !important; } .gallery-item:hover { transform: scale(1.02) !important; box-shadow: 0 6px 15px rgba(0,0,0,0.1) !important; } .tabs { border-radius: var(--border-radius) !important; overflow: hidden !important; } footer { display: none !important; } .settings-accordion legend span { font-weight: 600 !important; } .example-prompt { font-size: 0.9rem; color: #555; padding: 8px; background: #f5f7fa; border-radius: 6px; border-left: 3px solid var(--primary-color); margin-bottom: 8px; cursor: pointer; transition: all 0.2s; } .example-prompt:hover { background: #eef2f8; } .status-generating { color: #ffa200; font-weight: 500; display: flex; align-items: center; gap: 8px; } .status-generating::before { content: ""; display: inline-block; width: 12px; height: 12px; border-radius: 50%; background-color: #ffa200; animation: pulse 1.5s infinite; } .status-complete { color: #00c853; font-weight: 500; display: flex; align-items: center; gap: 8px; } .status-complete::before { content: ""; display: inline-block; width: 12px; height: 12px; border-radius: 50%; background-color: #00c853; } @keyframes pulse { 0% { opacity: 0.6; } 50% { opacity: 1; } 100% { opacity: 0.6; } } .gr-accordion-title { font-weight: 600 !important; color: var(--text-color) !important; } .tabs button { font-weight: 500 !important; padding: 10px 16px !important; } .tabs button.selected { font-weight: 600 !important; color: var(--primary-color) !important; background: rgba(106, 146, 204, 0.1) !important; } .gr-slider-container { padding: 10px 0 !important; } .gr-prose h3 { font-weight: 600 !important; color: var(--primary-color) !important; margin-bottom: 1rem !important; } """ with gr.Blocks(css=css, analytics_enabled=False, theme="soft") as demo: with gr.Column(): gr.HTML('''

✨ FLUX Ghibli LoRA Generator ✨

Community: https://discord.gg/openfreeai

''') with gr.Row(): with gr.Column(scale=3): with gr.Group(elem_classes="container"): prompt = gr.Textbox( label="Enter your imagination", placeholder="Describe your Ghibli-style image here...", lines=3 ) # ★ 새롭게 추가된 Text Rendering 입력란 text_rendering = gr.Textbox( label="Text Rendering (Multilingual: English, Korean...)", placeholder="Man saying '안녕' in 'speech bubble'", lines=1 ) with gr.Row(): run_button = gr.Button("✨ Generate Image", elem_classes="primary") clear_button = gr.Button("Clear", elem_classes="secondary") with gr.Accordion("Advanced Settings", open=False, elem_classes="settings-accordion"): with gr.Row(): seed = gr.Slider( label="Seed", minimum=0, maximum=MAX_SEED, step=1, value=42, ) randomize_seed = gr.Checkbox(label="Randomize seed", value=True) with gr.Row(): width = gr.Slider( label="Width", minimum=256, maximum=MAX_IMAGE_SIZE, step=32, value=1024, ) height = gr.Slider( label="Height", minimum=256, maximum=MAX_IMAGE_SIZE, step=32, value=768, ) with gr.Row(): guidance_scale = gr.Slider( label="Guidance scale", minimum=0.0, maximum=10.0, step=0.1, value=3.5, ) with gr.Row(): num_inference_steps = gr.Slider( label="Steps", minimum=1, maximum=50, step=1, value=30, ) lora_scale = gr.Slider( label="LoRA scale", minimum=0.0, maximum=1.0, step=0.1, value=1.0, ) with gr.Group(elem_classes="container"): gr.Markdown("### ✨ Example Prompts") examples_html = '\n'.join([f'
{example}
' for example in examples]) example_container = gr.HTML(examples_html) with gr.Column(scale=4): with gr.Group(elem_classes="container"): with gr.Group(): generation_status = gr.HTML('
Ready to generate
') result = gr.Image(label="Generated Image", elem_id="result-image") seed_text = gr.Number(label="Used Seed", value=42) with gr.Tabs(elem_classes="tabs") as tabs: with gr.TabItem("Gallery"): with gr.Group(elem_classes="container"): gallery_header = gr.Markdown("### 🖼️ Your Generated Masterpieces") with gr.Row(): refresh_btn = gr.Button("🔄 Refresh Gallery", elem_classes="secondary") generated_gallery = gr.Gallery( label="Generated Images", columns=3, value=load_generated_images(), height="500px", elem_classes="gallery-item" ) def refresh_gallery(): return load_generated_images() def clear_output(): return "", gr.update(value=None), seed, '
Ready to generate
' def before_generate(): return '
Generating image...
' def after_generate(image, seed, gallery): return image, seed, gallery, '
Generation complete!
' refresh_btn.click( fn=refresh_gallery, inputs=None, outputs=generated_gallery, ) clear_button.click( fn=clear_output, inputs=None, outputs=[prompt, result, seed_text, generation_status] ) # 체인에 마지막에 텍스트 렌더링 적용 (text_rendering 입력값이 있으면) run_button.click( fn=before_generate, inputs=None, outputs=generation_status, ).then( fn=inference, inputs=[ prompt, seed, randomize_seed, width, height, guidance_scale, num_inference_steps, lora_scale, ], outputs=[result, seed_text, generated_gallery], ).then( fn=after_generate, inputs=[result, seed_text, generated_gallery], outputs=[result, seed_text, generated_gallery, generation_status], ).then( fn=apply_text_rendering, inputs=[result, text_rendering], outputs=result, ) prompt.submit( fn=before_generate, inputs=None, outputs=generation_status, ).then( fn=inference, inputs=[ prompt, seed, randomize_seed, width, height, guidance_scale, num_inference_steps, lora_scale, ], outputs=[result, seed_text, generated_gallery], ).then( fn=after_generate, inputs=[result, seed_text, generated_gallery], outputs=[result, seed_text, generated_gallery, generation_status], ).then( fn=apply_text_rendering, inputs=[result, text_rendering], outputs=result, ) gr.HTML(""" """) try: demo.queue(concurrency_count=1, max_size=20) demo.launch(debug=True, show_api=False) except Exception as e: print(f"Error during launch: {e}") print("Trying alternative launch configuration...") demo.launch(debug=True, show_api=False, share=False)