ndc8 commited on
Commit
8208c22
Β·
1 Parent(s): 1ba257c

change model

Browse files
Files changed (2) hide show
  1. MODEL_CONFIG.md +190 -0
  2. backend_service.py +6 -5
MODEL_CONFIG.md ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ”§ Model Configuration Guide
2
+
3
+ The backend now supports **configurable models via environment variables**, making it easy to switch between different AI models without code changes.
4
+
5
+ ## πŸ“‹ Environment Variables
6
+
7
+ ### **Primary Configuration**
8
+
9
+ ```bash
10
+ # Main AI model for text generation (required)
11
+ export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
12
+
13
+ # Vision model for image processing (optional)
14
+ export VISION_MODEL="Salesforce/blip-image-captioning-base"
15
+
16
+ # HuggingFace token for private models (optional)
17
+ export HF_TOKEN="your_huggingface_token_here"
18
+ ```
19
+
20
+ ---
21
+
22
+ ## πŸš€ Usage Examples
23
+
24
+ ### **1. Use DeepSeek-R1 (Default)**
25
+
26
+ ```bash
27
+ # Uses your originally requested model
28
+ export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
29
+ ./gradio_env/bin/python backend_service.py
30
+ ```
31
+
32
+ ### **2. Use DialoGPT (Faster, smaller)**
33
+
34
+ ```bash
35
+ # Switch to lighter model for development/testing
36
+ export AI_MODEL="microsoft/DialoGPT-medium"
37
+ ./gradio_env/bin/python backend_service.py
38
+ ```
39
+
40
+ ### **3. Use Other Popular Models**
41
+
42
+ ```bash
43
+ # Use Zephyr chat model
44
+ export AI_MODEL="HuggingFaceH4/zephyr-7b-beta"
45
+ ./gradio_env/bin/python backend_service.py
46
+
47
+ # Use CodeLlama for code generation
48
+ export AI_MODEL="codellama/CodeLlama-7b-Instruct-hf"
49
+ ./gradio_env/bin/python backend_service.py
50
+
51
+ # Use Mistral
52
+ export AI_MODEL="mistralai/Mistral-7B-Instruct-v0.2"
53
+ ./gradio_env/bin/python backend_service.py
54
+ ```
55
+
56
+ ### **4. Use Different Vision Model**
57
+
58
+ ```bash
59
+ export AI_MODEL="microsoft/DialoGPT-medium"
60
+ export VISION_MODEL="nlpconnect/vit-gpt2-image-captioning"
61
+ ./gradio_env/bin/python backend_service.py
62
+ ```
63
+
64
+ ---
65
+
66
+ ## πŸ“ Startup Script Examples
67
+
68
+ ### **Development Mode (Fast startup)**
69
+
70
+ ```bash
71
+ #!/bin/bash
72
+ # dev_mode.sh
73
+ export AI_MODEL="microsoft/DialoGPT-medium"
74
+ export VISION_MODEL="Salesforce/blip-image-captioning-base"
75
+ ./gradio_env/bin/python backend_service.py
76
+ ```
77
+
78
+ ### **Production Mode (Your preferred model)**
79
+
80
+ ```bash
81
+ #!/bin/bash
82
+ # production_mode.sh
83
+ export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
84
+ export VISION_MODEL="Salesforce/blip-image-captioning-base"
85
+ export HF_TOKEN="$YOUR_HF_TOKEN"
86
+ ./gradio_env/bin/python backend_service.py
87
+ ```
88
+
89
+ ### **Testing Mode (Lightweight)**
90
+
91
+ ```bash
92
+ #!/bin/bash
93
+ # test_mode.sh
94
+ export AI_MODEL="microsoft/DialoGPT-medium"
95
+ export VISION_MODEL="Salesforce/blip-image-captioning-base"
96
+ ./gradio_env/bin/python backend_service.py
97
+ ```
98
+
99
+ ---
100
+
101
+ ## πŸ” Model Verification
102
+
103
+ After starting the backend, check which model is loaded:
104
+
105
+ ```bash
106
+ curl http://localhost:8000/health
107
+ ```
108
+
109
+ Response will show:
110
+
111
+ ```json
112
+ {
113
+ "status": "healthy",
114
+ "model": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
115
+ "version": "1.0.0"
116
+ }
117
+ ```
118
+
119
+ ---
120
+
121
+ ## πŸ“Š Model Comparison
122
+
123
+ | Model | Size | Speed | Quality | Use Case |
124
+ | --------------------------------------- | ------ | ------- | ------------ | ------------------- |
125
+ | `microsoft/DialoGPT-medium` | ~355MB | ⚑ Fast | Good | Development/Testing |
126
+ | `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` | ~16GB | 🐌 Slow | ⭐ Excellent | Production |
127
+ | `HuggingFaceH4/zephyr-7b-beta` | ~14GB | 🐌 Slow | ⭐ Excellent | Chat/Conversation |
128
+ | `codellama/CodeLlama-7b-Instruct-hf` | ~13GB | 🐌 Slow | ⭐ Good | Code Generation |
129
+
130
+ ---
131
+
132
+ ## πŸ› οΈ Troubleshooting
133
+
134
+ ### **Model Not Found**
135
+
136
+ ```bash
137
+ # Verify model exists on HuggingFace
138
+ ./gradio_env/bin/python -c "
139
+ from huggingface_hub import HfApi
140
+ api = HfApi()
141
+ try:
142
+ info = api.model_info('your-model-name')
143
+ print(f'βœ… Model exists: {info.id}')
144
+ except:
145
+ print('❌ Model not found')
146
+ "
147
+ ```
148
+
149
+ ### **Memory Issues**
150
+
151
+ ```bash
152
+ # Use smaller model for limited RAM
153
+ export AI_MODEL="microsoft/DialoGPT-medium" # ~355MB
154
+ # or
155
+ export AI_MODEL="distilgpt2" # ~82MB
156
+ ```
157
+
158
+ ### **Authentication Issues**
159
+
160
+ ```bash
161
+ # Set HuggingFace token for private models
162
+ export HF_TOKEN="hf_your_token_here"
163
+ ```
164
+
165
+ ---
166
+
167
+ ## 🎯 Quick Switch Commands
168
+
169
+ ```bash
170
+ # Quick switch to development mode
171
+ export AI_MODEL="microsoft/DialoGPT-medium" && ./gradio_env/bin/python backend_service.py
172
+
173
+ # Quick switch to production mode
174
+ export AI_MODEL="deepseek-ai/DeepSeek-R1-0528-Qwen3-8B" && ./gradio_env/bin/python backend_service.py
175
+
176
+ # Quick switch with custom vision model
177
+ export AI_MODEL="microsoft/DialoGPT-medium" AI_VISION="nlpconnect/vit-gpt2-image-captioning" && ./gradio_env/bin/python backend_service.py
178
+ ```
179
+
180
+ ---
181
+
182
+ ## βœ… Summary
183
+
184
+ - **Environment Variable**: `AI_MODEL` controls the main text generation model
185
+ - **Default**: `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` (your original preference)
186
+ - **Alternative**: `microsoft/DialoGPT-medium` (faster for development)
187
+ - **Vision Model**: `VISION_MODEL` controls image processing model
188
+ - **No Code Changes**: Switch models by changing environment variables only
189
+
190
+ **Your original DeepSeek-R1 model is still the default** - I simply made it configurable so you can easily switch when needed!
backend_service.py CHANGED
@@ -76,7 +76,7 @@ class ChatMessage(BaseModel):
76
  return v
77
 
78
  class ChatCompletionRequest(BaseModel):
79
- model: str = Field(default="unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF", description="The model to use for completion")
80
  messages: List[ChatMessage] = Field(..., description="List of messages in the conversation")
81
  max_tokens: Optional[int] = Field(default=512, ge=1, le=2048, description="Maximum tokens to generate")
82
  temperature: Optional[float] = Field(default=0.7, ge=0.0, le=2.0, description="Sampling temperature")
@@ -124,8 +124,9 @@ class CompletionRequest(BaseModel):
124
 
125
 
126
  # Global variables for model management
127
- current_model = "microsoft/DialoGPT-medium"
128
- vision_model = "Salesforce/blip-image-captioning-base" # Working model for image captioning
 
129
  tokenizer = None
130
  model = None
131
  image_text_pipeline = None # type: ignore
@@ -176,14 +177,14 @@ async def lifespan(app: FastAPI):
176
  global tokenizer, model, image_text_pipeline
177
  logger.info("πŸš€ Starting AI Backend Service...")
178
  try:
179
- # Load tokenizer and model directly from HuggingFace repo (GGUF format supported)
180
  logger.info(f"πŸ“₯ Loading tokenizer from {current_model}...")
181
  tokenizer = AutoTokenizer.from_pretrained(current_model)
182
 
183
  logger.info(f"πŸ“₯ Loading model from {current_model}...")
184
  model = AutoModelForCausalLM.from_pretrained(current_model)
185
 
186
- logger.info(f"βœ… Successfully loaded GGUF model and tokenizer: {current_model}")
187
 
188
  # Load image pipeline for multimodal support
189
  try:
 
76
  return v
77
 
78
  class ChatCompletionRequest(BaseModel):
79
+ model: str = Field(default_factory=lambda: os.environ.get("AI_MODEL", "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"), description="The model to use for completion")
80
  messages: List[ChatMessage] = Field(..., description="List of messages in the conversation")
81
  max_tokens: Optional[int] = Field(default=512, ge=1, le=2048, description="Maximum tokens to generate")
82
  temperature: Optional[float] = Field(default=0.7, ge=0.0, le=2.0, description="Sampling temperature")
 
124
 
125
 
126
  # Global variables for model management
127
+ # Model can be configured via environment variable - defaults to DeepSeek-R1
128
+ current_model = os.environ.get("AI_MODEL", "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B")
129
+ vision_model = os.environ.get("VISION_MODEL", "Salesforce/blip-image-captioning-base")
130
  tokenizer = None
131
  model = None
132
  image_text_pipeline = None # type: ignore
 
177
  global tokenizer, model, image_text_pipeline
178
  logger.info("πŸš€ Starting AI Backend Service...")
179
  try:
180
+ # Load tokenizer and model directly from HuggingFace repo (standard transformers format)
181
  logger.info(f"πŸ“₯ Loading tokenizer from {current_model}...")
182
  tokenizer = AutoTokenizer.from_pretrained(current_model)
183
 
184
  logger.info(f"πŸ“₯ Loading model from {current_model}...")
185
  model = AutoModelForCausalLM.from_pretrained(current_model)
186
 
187
+ logger.info(f"βœ… Successfully loaded model and tokenizer: {current_model}")
188
 
189
  # Load image pipeline for multimodal support
190
  try: