Update README.md
Browse files
README.md
CHANGED
@@ -2,37 +2,51 @@
|
|
2 |
library_name: transformers
|
3 |
license: mit
|
4 |
datasets:
|
5 |
-
- X-ART/LeX-R1-60K
|
6 |
base_model:
|
7 |
-
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
|
8 |
pipeline_tag: text-generation
|
9 |
tags:
|
10 |
-
- art
|
11 |
-
- text-rendering
|
12 |
---
|
13 |
-
**LeX-Enhancer** is a lightweight prompt enhancement model distilled from DeepSeek-R1. Specifically, we collect **60,856 prompt pairs** before and after DeepSeek-R1 enhancement, and fine-tune a Deepseek-R1-Distilled-Qwen-14B model using LoRA to replicate the detailed prompting capabilities of R1. This enables efficient, large-scale generation of high-quality, visually grounded prompts.
|
14 |
|
15 |
-
|
16 |
-
A thank you card with the words very much, with the text on it: \"VERY\" in black, \"MUCH\" in yellow.
|
17 |
|
18 |
-
|
19 |
-
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
-
Use this code for inference:
|
23 |
```python
|
24 |
import torch, os
|
25 |
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
|
26 |
|
|
|
27 |
SYSTEM_TEMPLATE = (
|
28 |
"A conversation between User and Assistant. The user asks a question, and the Assistant solves it. "
|
29 |
"The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. "
|
30 |
"The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., "
|
31 |
"<think> reasoning process here </think> <answer> answer here </answer>."
|
32 |
)
|
|
|
33 |
model_path = 'X-ART/LeX-Enhancer'
|
34 |
|
35 |
-
#
|
36 |
simple_caption = "A thank you card with the words very much, with the text on it: \"VERY\" in black, \"MUCH\" in yellow."
|
37 |
|
38 |
def create_chat_template(user_prompt):
|
@@ -43,7 +57,7 @@ def create_chat_template(user_prompt):
|
|
43 |
]
|
44 |
|
45 |
def create_direct_template(user_prompt):
|
46 |
-
return user_prompt + "<think>"
|
47 |
|
48 |
def create_user_prompt(simple_caption):
|
49 |
return (
|
@@ -58,20 +72,21 @@ def create_user_prompt(simple_caption):
|
|
58 |
"6. Avoid using vague expressions such as \"may be\" or \"might be\"; the generated caption must be in a definitive, narrative tone. "
|
59 |
"7. Do not use negative sentence structures, such as \"there is nothing in the image,\" etc. The entire caption should directly describe the content of the image. "
|
60 |
"8. The entire output should be limited to 200 words.\n\n"
|
61 |
-
"SIMPLE CAPTION: {
|
62 |
-
)
|
63 |
-
|
|
|
64 |
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
65 |
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16)
|
66 |
|
67 |
-
#
|
68 |
-
messages = create_direct_template(create_user_prompt(simple_caption))
|
69 |
-
input_ids = tokenizer.encode(messages, return_tensors="pt")
|
70 |
|
71 |
-
#
|
72 |
streamer = TextStreamer(tokenizer, skip_special_tokens=True, clean_up_tokenization_spaces=True)
|
73 |
output = model.generate(
|
74 |
-
input_ids
|
75 |
max_length=2048,
|
76 |
num_return_sequences=1,
|
77 |
do_sample=True,
|
@@ -80,7 +95,5 @@ output = model.generate(
|
|
80 |
streamer=streamer
|
81 |
)
|
82 |
|
83 |
-
# Print the generated text
|
84 |
print("*" * 80)
|
85 |
-
#
|
86 |
-
```
|
|
|
2 |
library_name: transformers
|
3 |
license: mit
|
4 |
datasets:
|
5 |
+
- X-ART/LeX-R1-60K
|
6 |
base_model:
|
7 |
+
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
|
8 |
pipeline_tag: text-generation
|
9 |
tags:
|
10 |
+
- art
|
11 |
+
- text-rendering
|
12 |
---
|
|
|
13 |
|
14 |
+
# 🎨 LeX-Enhancer: Visual Prompt Generator
|
|
|
15 |
|
16 |
+
**LeX-Enhancer** is a lightweight **prompt enhancement model** distilled from [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B).
|
17 |
+
We collected **60,856 caption pairs** — before and after DeepSeek-R1 enhancement — and fine-tuned a `Distilled-Qwen-14B` model using **LoRA** to reproduce high-quality, richly visualized prompt outputs.
|
18 |
|
19 |
+
This enables **efficient, large-scale generation of visually grounded prompts**, perfect for high-fidelity text-to-image generation.
|
20 |
+
|
21 |
+
---
|
22 |
+
|
23 |
+
## ✍️ Example: From Simple to Enhanced Caption
|
24 |
+
|
25 |
+
> **🧾 Input (Simple Caption):**
|
26 |
+
> A thank you card with the words very much, with the text on it: "VERY" in black, "MUCH" in yellow.
|
27 |
+
|
28 |
+
> **🪄 Output (Enhanced Caption):**
|
29 |
+
> The image features a rectangular thank you card with a muted, warm-toned background in a soft gradient of creamy beige and blush pink, providing a subtle textured finish that adds depth without overwhelming the central elements. Centered near the top of the card, the word "VERY" is rendered in thick, uppercase black lettering with a clean sans-serif font, slightly curved to follow the natural flow of the composition. Directly beneath it, the word "MUCH" appears in a vibrant, sunny-yellow hue, matching the same bold font style but slightly larger in size, positioned to align neatly with the lower edge of "VERY," creating a balanced hierarchy. Both texts are outlined with faint drop shadows, enhancing readability against the softly lit backdrop. The cards’ edges are crisp, with minimalistic borders in a lighter shade of the background tones, and the overall layout is symmetrical, exuding a polished, contemporary aesthetic. Gentle ambient lighting casts soft, diffused shadows around the card’s corners, suggesting a lightly textured surface underneath, while the absence of decorative embellishments keeps the focus on the typography. The color palette harmonizes warmth and neutrality, ensuring the text remains the focal point while maintaining a serene, approachable ambiance.
|
30 |
+
|
31 |
+
---
|
32 |
+
|
33 |
+
## 🚀 Usage (Python Code)
|
34 |
|
|
|
35 |
```python
|
36 |
import torch, os
|
37 |
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
|
38 |
|
39 |
+
# System instruction for reasoning + answering
|
40 |
SYSTEM_TEMPLATE = (
|
41 |
"A conversation between User and Assistant. The user asks a question, and the Assistant solves it. "
|
42 |
"The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. "
|
43 |
"The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., "
|
44 |
"<think> reasoning process here </think> <answer> answer here </answer>."
|
45 |
)
|
46 |
+
|
47 |
model_path = 'X-ART/LeX-Enhancer'
|
48 |
|
49 |
+
# Your simple caption goes here
|
50 |
simple_caption = "A thank you card with the words very much, with the text on it: \"VERY\" in black, \"MUCH\" in yellow."
|
51 |
|
52 |
def create_chat_template(user_prompt):
|
|
|
57 |
]
|
58 |
|
59 |
def create_direct_template(user_prompt):
|
60 |
+
return user_prompt + "<think>"
|
61 |
|
62 |
def create_user_prompt(simple_caption):
|
63 |
return (
|
|
|
72 |
"6. Avoid using vague expressions such as \"may be\" or \"might be\"; the generated caption must be in a definitive, narrative tone. "
|
73 |
"7. Do not use negative sentence structures, such as \"there is nothing in the image,\" etc. The entire caption should directly describe the content of the image. "
|
74 |
"8. The entire output should be limited to 200 words.\n\n"
|
75 |
+
f"SIMPLE CAPTION: {simple_caption}"
|
76 |
+
)
|
77 |
+
|
78 |
+
# Load model and tokenizer
|
79 |
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
80 |
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16)
|
81 |
|
82 |
+
# Prepare input prompt
|
83 |
+
messages = create_direct_template(create_user_prompt(simple_caption))
|
84 |
+
input_ids = tokenizer.encode(messages, return_tensors="pt").to(model.device)
|
85 |
|
86 |
+
# Stream output
|
87 |
streamer = TextStreamer(tokenizer, skip_special_tokens=True, clean_up_tokenization_spaces=True)
|
88 |
output = model.generate(
|
89 |
+
input_ids,
|
90 |
max_length=2048,
|
91 |
num_return_sequences=1,
|
92 |
do_sample=True,
|
|
|
95 |
streamer=streamer
|
96 |
)
|
97 |
|
|
|
98 |
print("*" * 80)
|
99 |
+
# Output will stream via TextStreamer
|
|