X-ART
/

LeX-Enhancer

@@ -2,37 +2,51 @@
 library_name: transformers
 license: mit
 datasets:
-- X-ART/LeX-R1-60K
 base_model:
-- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
 pipeline_tag: text-generation
 tags:
-- art
-- text-rendering
 ---
-**LeX-Enhancer** is a lightweight prompt enhancement model distilled from DeepSeek-R1. Specifically, we collect **60,856 prompt pairs** before and after DeepSeek-R1 enhancement, and fine-tune a Deepseek-R1-Distilled-Qwen-14B model using LoRA to replicate the detailed prompting capabilities of R1. This enables efficient, large-scale generation of high-quality, visually grounded prompts.
-> **Input ~ What you would like to paint in the text prompt:**
-A thank you card with the words very much, with the text on it: \"VERY\" in black, \"MUCH\" in yellow.
-> **Output ~ Enhancer text prompt:**
-The image features a rectangular thank you card with a muted, warm-toned background in a soft gradient of creamy beige and blush pink, providing a subtle textured finish that adds depth without overwhelming the central elements. Centered near the top of the card, the word "VERY" is rendered in thick, uppercase black lettering with a clean sans-serif font, slightly curved to follow the natural flow of the composition. Directly beneath it, the word "MUCH" appears in a vibrant, sunny-yellow hue, matching the same bold font style but slightly larger in size, positioned to align neatly with the lower edge of "VERY," creating a balanced hierarchy. Both texts are outlined with faint drop shadows, enhancing readability against the softly lit backdrop. The cards’ edges are crisp, with minimalistic borders in a lighter shade of the background tones, and the overall layout is symmetrical, exuding a polished, contemporary aesthetic. Gentle ambient lighting casts soft, diffused shadows around the card’s corners, suggesting a lightly textured surface underneath, while the absence of decorative embellishments keeps the focus on the typography. The color palette harmonizes warmth and neutrality, ensuring the text remains the focal point while maintaining a serene, approachable ambiance.
-Use this code for inference:
 ```python
 import torch, os
 from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
 SYSTEM_TEMPLATE = (
     "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. "
     "The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. "
     "The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., "
     "<think> reasoning process here </think> <answer> answer here </answer>."
 )
 model_path = 'X-ART/LeX-Enhancer'
-# Change to what you want to draw in next line
 simple_caption = "A thank you card with the words very much, with the text on it: \"VERY\" in black, \"MUCH\" in yellow."
 def create_chat_template(user_prompt):
@@ -43,7 +57,7 @@ def create_chat_template(user_prompt):
     ]
 def create_direct_template(user_prompt):
-    return user_prompt + "<think>" # better
 def create_user_prompt(simple_caption):
     return (
@@ -58,20 +72,21 @@ def create_user_prompt(simple_caption):
         "6. Avoid using vague expressions such as \"may be\" or \"might be\"; the generated caption must be in a definitive, narrative tone. "
         "7. Do not use negative sentence structures, such as \"there is nothing in the image,\" etc. The entire caption should directly describe the content of the image. "
         "8. The entire output should be limited to 200 words.\n\n"
-        "SIMPLE CAPTION: {0}"
-    ).format(simple_caption)
 tokenizer = AutoTokenizer.from_pretrained(model_path)
 model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16)
-# Tokenize the input prompt
-messages = create_direct_template(create_user_prompt(simple_caption))  # 3.for direct template
-input_ids = tokenizer.encode(messages, return_tensors="pt")
-# Generate text using the model
 streamer = TextStreamer(tokenizer, skip_special_tokens=True, clean_up_tokenization_spaces=True)
 output = model.generate(
-    input_ids.to(model.device),
     max_length=2048,
     num_return_sequences=1,
     do_sample=True,
@@ -80,7 +95,5 @@ output = model.generate(
     streamer=streamer
 )
-# Print the generated text
 print("*" * 80)
-# print(generated_text)
-```

 library_name: transformers
 license: mit
 datasets:
+  - X-ART/LeX-R1-60K
 base_model:
+  - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
 pipeline_tag: text-generation
 tags:
+  - art
+  - text-rendering
 ---
+# 🎨 LeX-Enhancer: Visual Prompt Generator
+**LeX-Enhancer** is a lightweight **prompt enhancement model** distilled from [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B).
+We collected **60,856 caption pairs** — before and after DeepSeek-R1 enhancement — and fine-tuned a `Distilled-Qwen-14B` model using **LoRA** to reproduce high-quality, richly visualized prompt outputs.
+This enables **efficient, large-scale generation of visually grounded prompts**, perfect for high-fidelity text-to-image generation.
+---
+## ✍️ Example: From Simple to Enhanced Caption
+> **🧾 Input (Simple Caption):**
+> A thank you card with the words very much, with the text on it: "VERY" in black, "MUCH" in yellow.
+> **🪄 Output (Enhanced Caption):**
+> The image features a rectangular thank you card with a muted, warm-toned background in a soft gradient of creamy beige and blush pink, providing a subtle textured finish that adds depth without overwhelming the central elements. Centered near the top of the card, the word "VERY" is rendered in thick, uppercase black lettering with a clean sans-serif font, slightly curved to follow the natural flow of the composition. Directly beneath it, the word "MUCH" appears in a vibrant, sunny-yellow hue, matching the same bold font style but slightly larger in size, positioned to align neatly with the lower edge of "VERY," creating a balanced hierarchy. Both texts are outlined with faint drop shadows, enhancing readability against the softly lit backdrop. The cards’ edges are crisp, with minimalistic borders in a lighter shade of the background tones, and the overall layout is symmetrical, exuding a polished, contemporary aesthetic. Gentle ambient lighting casts soft, diffused shadows around the card’s corners, suggesting a lightly textured surface underneath, while the absence of decorative embellishments keeps the focus on the typography. The color palette harmonizes warmth and neutrality, ensuring the text remains the focal point while maintaining a serene, approachable ambiance.
+---
+## 🚀 Usage (Python Code)
 ```python
 import torch, os
 from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
+# System instruction for reasoning + answering
 SYSTEM_TEMPLATE = (
     "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. "
     "The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. "
     "The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., "
     "<think> reasoning process here </think> <answer> answer here </answer>."
 )
 model_path = 'X-ART/LeX-Enhancer'
+# Your simple caption goes here
 simple_caption = "A thank you card with the words very much, with the text on it: \"VERY\" in black, \"MUCH\" in yellow."
 def create_chat_template(user_prompt):
     ]
 def create_direct_template(user_prompt):
+    return user_prompt + "<think>"
 def create_user_prompt(simple_caption):
     return (
         "6. Avoid using vague expressions such as \"may be\" or \"might be\"; the generated caption must be in a definitive, narrative tone. "
         "7. Do not use negative sentence structures, such as \"there is nothing in the image,\" etc. The entire caption should directly describe the content of the image. "
         "8. The entire output should be limited to 200 words.\n\n"
+        f"SIMPLE CAPTION: {simple_caption}"
+    )
+# Load model and tokenizer
 tokenizer = AutoTokenizer.from_pretrained(model_path)
 model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16)
+# Prepare input prompt
+messages = create_direct_template(create_user_prompt(simple_caption))
+input_ids = tokenizer.encode(messages, return_tensors="pt").to(model.device)
+# Stream output
 streamer = TextStreamer(tokenizer, skip_special_tokens=True, clean_up_tokenization_spaces=True)
 output = model.generate(
+    input_ids,
     max_length=2048,
     num_return_sequences=1,
     do_sample=True,
     streamer=streamer
 )
 print("*" * 80)
+# Output will stream via TextStreamer