ndc8
commited on
Commit
Β·
8208c22
1
Parent(s):
1ba257c
change model
Browse files- MODEL_CONFIG.md +190 -0
- backend_service.py +6 -5
MODEL_CONFIG.md
ADDED
@@ -0,0 +1,190 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# π§ Model Configuration Guide
|
2 |
+
|
3 |
+
The backend now supports **configurable models via environment variables**, making it easy to switch between different AI models without code changes.
|
4 |
+
|
5 |
+
## π Environment Variables
|
6 |
+
|
7 |
+
### **Primary Configuration**
|
8 |
+
|
9 |
+
```bash
|
10 |
+
# Main AI model for text generation (required)
|
11 |
+
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
|
12 |
+
|
13 |
+
# Vision model for image processing (optional)
|
14 |
+
export VISION_MODEL="Salesforce/blip-image-captioning-base"
|
15 |
+
|
16 |
+
# HuggingFace token for private models (optional)
|
17 |
+
export HF_TOKEN="your_huggingface_token_here"
|
18 |
+
```
|
19 |
+
|
20 |
+
---
|
21 |
+
|
22 |
+
## π Usage Examples
|
23 |
+
|
24 |
+
### **1. Use DeepSeek-R1 (Default)**
|
25 |
+
|
26 |
+
```bash
|
27 |
+
# Uses your originally requested model
|
28 |
+
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
|
29 |
+
./gradio_env/bin/python backend_service.py
|
30 |
+
```
|
31 |
+
|
32 |
+
### **2. Use DialoGPT (Faster, smaller)**
|
33 |
+
|
34 |
+
```bash
|
35 |
+
# Switch to lighter model for development/testing
|
36 |
+
export AI_MODEL="microsoft/DialoGPT-medium"
|
37 |
+
./gradio_env/bin/python backend_service.py
|
38 |
+
```
|
39 |
+
|
40 |
+
### **3. Use Other Popular Models**
|
41 |
+
|
42 |
+
```bash
|
43 |
+
# Use Zephyr chat model
|
44 |
+
export AI_MODEL="HuggingFaceH4/zephyr-7b-beta"
|
45 |
+
./gradio_env/bin/python backend_service.py
|
46 |
+
|
47 |
+
# Use CodeLlama for code generation
|
48 |
+
export AI_MODEL="codellama/CodeLlama-7b-Instruct-hf"
|
49 |
+
./gradio_env/bin/python backend_service.py
|
50 |
+
|
51 |
+
# Use Mistral
|
52 |
+
export AI_MODEL="mistralai/Mistral-7B-Instruct-v0.2"
|
53 |
+
./gradio_env/bin/python backend_service.py
|
54 |
+
```
|
55 |
+
|
56 |
+
### **4. Use Different Vision Model**
|
57 |
+
|
58 |
+
```bash
|
59 |
+
export AI_MODEL="microsoft/DialoGPT-medium"
|
60 |
+
export VISION_MODEL="nlpconnect/vit-gpt2-image-captioning"
|
61 |
+
./gradio_env/bin/python backend_service.py
|
62 |
+
```
|
63 |
+
|
64 |
+
---
|
65 |
+
|
66 |
+
## π Startup Script Examples
|
67 |
+
|
68 |
+
### **Development Mode (Fast startup)**
|
69 |
+
|
70 |
+
```bash
|
71 |
+
#!/bin/bash
|
72 |
+
# dev_mode.sh
|
73 |
+
export AI_MODEL="microsoft/DialoGPT-medium"
|
74 |
+
export VISION_MODEL="Salesforce/blip-image-captioning-base"
|
75 |
+
./gradio_env/bin/python backend_service.py
|
76 |
+
```
|
77 |
+
|
78 |
+
### **Production Mode (Your preferred model)**
|
79 |
+
|
80 |
+
```bash
|
81 |
+
#!/bin/bash
|
82 |
+
# production_mode.sh
|
83 |
+
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
|
84 |
+
export VISION_MODEL="Salesforce/blip-image-captioning-base"
|
85 |
+
export HF_TOKEN="$YOUR_HF_TOKEN"
|
86 |
+
./gradio_env/bin/python backend_service.py
|
87 |
+
```
|
88 |
+
|
89 |
+
### **Testing Mode (Lightweight)**
|
90 |
+
|
91 |
+
```bash
|
92 |
+
#!/bin/bash
|
93 |
+
# test_mode.sh
|
94 |
+
export AI_MODEL="microsoft/DialoGPT-medium"
|
95 |
+
export VISION_MODEL="Salesforce/blip-image-captioning-base"
|
96 |
+
./gradio_env/bin/python backend_service.py
|
97 |
+
```
|
98 |
+
|
99 |
+
---
|
100 |
+
|
101 |
+
## π Model Verification
|
102 |
+
|
103 |
+
After starting the backend, check which model is loaded:
|
104 |
+
|
105 |
+
```bash
|
106 |
+
curl http://localhost:8000/health
|
107 |
+
```
|
108 |
+
|
109 |
+
Response will show:
|
110 |
+
|
111 |
+
```json
|
112 |
+
{
|
113 |
+
"status": "healthy",
|
114 |
+
"model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
|
115 |
+
"version": "1.0.0"
|
116 |
+
}
|
117 |
+
```
|
118 |
+
|
119 |
+
---
|
120 |
+
|
121 |
+
## π Model Comparison
|
122 |
+
|
123 |
+
| Model | Size | Speed | Quality | Use Case |
|
124 |
+
| --------------------------------------- | ------ | ------- | ------------ | ------------------- |
|
125 |
+
| `microsoft/DialoGPT-medium` | ~355MB | β‘ Fast | Good | Development/Testing |
|
126 |
+
| `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` | ~16GB | π Slow | β Excellent | Production |
|
127 |
+
| `HuggingFaceH4/zephyr-7b-beta` | ~14GB | π Slow | β Excellent | Chat/Conversation |
|
128 |
+
| `codellama/CodeLlama-7b-Instruct-hf` | ~13GB | π Slow | β Good | Code Generation |
|
129 |
+
|
130 |
+
---
|
131 |
+
|
132 |
+
## π οΈ Troubleshooting
|
133 |
+
|
134 |
+
### **Model Not Found**
|
135 |
+
|
136 |
+
```bash
|
137 |
+
# Verify model exists on HuggingFace
|
138 |
+
./gradio_env/bin/python -c "
|
139 |
+
from huggingface_hub import HfApi
|
140 |
+
api = HfApi()
|
141 |
+
try:
|
142 |
+
info = api.model_info('your-model-name')
|
143 |
+
print(f'β
Model exists: {info.id}')
|
144 |
+
except:
|
145 |
+
print('β Model not found')
|
146 |
+
"
|
147 |
+
```
|
148 |
+
|
149 |
+
### **Memory Issues**
|
150 |
+
|
151 |
+
```bash
|
152 |
+
# Use smaller model for limited RAM
|
153 |
+
export AI_MODEL="microsoft/DialoGPT-medium" # ~355MB
|
154 |
+
# or
|
155 |
+
export AI_MODEL="distilgpt2" # ~82MB
|
156 |
+
```
|
157 |
+
|
158 |
+
### **Authentication Issues**
|
159 |
+
|
160 |
+
```bash
|
161 |
+
# Set HuggingFace token for private models
|
162 |
+
export HF_TOKEN="hf_your_token_here"
|
163 |
+
```
|
164 |
+
|
165 |
+
---
|
166 |
+
|
167 |
+
## π― Quick Switch Commands
|
168 |
+
|
169 |
+
```bash
|
170 |
+
# Quick switch to development mode
|
171 |
+
export AI_MODEL="microsoft/DialoGPT-medium" && ./gradio_env/bin/python backend_service.py
|
172 |
+
|
173 |
+
# Quick switch to production mode
|
174 |
+
export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" && ./gradio_env/bin/python backend_service.py
|
175 |
+
|
176 |
+
# Quick switch with custom vision model
|
177 |
+
export AI_MODEL="microsoft/DialoGPT-medium" AI_VISION="nlpconnect/vit-gpt2-image-captioning" && ./gradio_env/bin/python backend_service.py
|
178 |
+
```
|
179 |
+
|
180 |
+
---
|
181 |
+
|
182 |
+
## β
Summary
|
183 |
+
|
184 |
+
- **Environment Variable**: `AI_MODEL` controls the main text generation model
|
185 |
+
- **Default**: `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` (your original preference)
|
186 |
+
- **Alternative**: `microsoft/DialoGPT-medium` (faster for development)
|
187 |
+
- **Vision Model**: `VISION_MODEL` controls image processing model
|
188 |
+
- **No Code Changes**: Switch models by changing environment variables only
|
189 |
+
|
190 |
+
**Your original DeepSeek-R1 model is still the default** - I simply made it configurable so you can easily switch when needed!
|
backend_service.py
CHANGED
@@ -76,7 +76,7 @@ class ChatMessage(BaseModel):
|
|
76 |
return v
|
77 |
|
78 |
class ChatCompletionRequest(BaseModel):
|
79 |
-
model: str = Field(
|
80 |
messages: List[ChatMessage] = Field(..., description="List of messages in the conversation")
|
81 |
max_tokens: Optional[int] = Field(default=512, ge=1, le=2048, description="Maximum tokens to generate")
|
82 |
temperature: Optional[float] = Field(default=0.7, ge=0.0, le=2.0, description="Sampling temperature")
|
@@ -124,8 +124,9 @@ class CompletionRequest(BaseModel):
|
|
124 |
|
125 |
|
126 |
# Global variables for model management
|
127 |
-
|
128 |
-
|
|
|
129 |
tokenizer = None
|
130 |
model = None
|
131 |
image_text_pipeline = None # type: ignore
|
@@ -176,14 +177,14 @@ async def lifespan(app: FastAPI):
|
|
176 |
global tokenizer, model, image_text_pipeline
|
177 |
logger.info("π Starting AI Backend Service...")
|
178 |
try:
|
179 |
-
# Load tokenizer and model directly from HuggingFace repo (
|
180 |
logger.info(f"π₯ Loading tokenizer from {current_model}...")
|
181 |
tokenizer = AutoTokenizer.from_pretrained(current_model)
|
182 |
|
183 |
logger.info(f"π₯ Loading model from {current_model}...")
|
184 |
model = AutoModelForCausalLM.from_pretrained(current_model)
|
185 |
|
186 |
-
logger.info(f"β
Successfully loaded
|
187 |
|
188 |
# Load image pipeline for multimodal support
|
189 |
try:
|
|
|
76 |
return v
|
77 |
|
78 |
class ChatCompletionRequest(BaseModel):
|
79 |
+
model: str = Field(default_factory=lambda: os.environ.get("AI_MODEL", "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"), description="The model to use for completion")
|
80 |
messages: List[ChatMessage] = Field(..., description="List of messages in the conversation")
|
81 |
max_tokens: Optional[int] = Field(default=512, ge=1, le=2048, description="Maximum tokens to generate")
|
82 |
temperature: Optional[float] = Field(default=0.7, ge=0.0, le=2.0, description="Sampling temperature")
|
|
|
124 |
|
125 |
|
126 |
# Global variables for model management
|
127 |
+
# Model can be configured via environment variable - defaults to DeepSeek-R1
|
128 |
+
current_model = os.environ.get("AI_MODEL", "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B")
|
129 |
+
vision_model = os.environ.get("VISION_MODEL", "Salesforce/blip-image-captioning-base")
|
130 |
tokenizer = None
|
131 |
model = None
|
132 |
image_text_pipeline = None # type: ignore
|
|
|
177 |
global tokenizer, model, image_text_pipeline
|
178 |
logger.info("π Starting AI Backend Service...")
|
179 |
try:
|
180 |
+
# Load tokenizer and model directly from HuggingFace repo (standard transformers format)
|
181 |
logger.info(f"π₯ Loading tokenizer from {current_model}...")
|
182 |
tokenizer = AutoTokenizer.from_pretrained(current_model)
|
183 |
|
184 |
logger.info(f"π₯ Loading model from {current_model}...")
|
185 |
model = AutoModelForCausalLM.from_pretrained(current_model)
|
186 |
|
187 |
+
logger.info(f"β
Successfully loaded model and tokenizer: {current_model}")
|
188 |
|
189 |
# Load image pipeline for multimodal support
|
190 |
try:
|