File size: 11,800 Bytes
6b91eb5
d3ad561
 
6b91eb5
 
b6cf19e
 
6b91eb5
 
 
d3ad561
 
cb5d5f8
d3ad561
 
 
cb5d5f8
d3ad561
cb5d5f8
 
 
 
d3ad561
 
 
 
 
 
 
 
cb5d5f8
d3ad561
cb5d5f8
 
 
d3ad561
 
 
 
cb5d5f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d3ad561
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cb5d5f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d3ad561
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78b611a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
---
title: Multimodal AI Backend Service
emoji: πŸš€
colorFrom: yellow
colorTo: purple
sdk: docker
app_port: 8000
pinned: false
---

# firstAI - Multimodal AI Backend πŸš€

A powerful AI backend service with **multimodal capabilities** and **advanced deployment support** - supporting both text generation and image analysis using transformers pipelines.

## πŸŽ‰ Features

### πŸ€– Configurable AI Models

- **Default Text Model**: Microsoft DialoGPT-medium (deployment-friendly)
- **Advanced Models**: Support for quantized models (Unsloth, 4-bit, GGUF)
- **Environment Configuration**: Runtime model selection via environment variables
- **Quantization Support**: Automatic 4-bit quantization with fallback mechanisms

### πŸ–ΌοΈ Multimodal Support

- Process text-only messages
- Analyze images from URLs
- Combined image + text conversations
- OpenAI Vision API compatible format

### οΏ½ Production Ready

- **Enhanced Deployment**: Multi-level fallback for quantized models
- **Environment Flexibility**: Works in constrained deployment environments
- **Error Resilience**: Comprehensive error handling with graceful degradation
- FastAPI backend with automatic docs
- Health checks and monitoring
- PyTorch with MPS acceleration (Apple Silicon)

### πŸ”§ Model Configuration

Configure models via environment variables:

```bash
# Set custom text model (optional)
export AI_MODEL="microsoft/DialoGPT-medium"

# Set custom vision model (optional)
export VISION_MODEL="Salesforce/blip-image-captioning-base"

# For private models (optional)
export HF_TOKEN="your_huggingface_token"
```

**Supported Model Types:**

- Standard models: `microsoft/DialoGPT-medium`, `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B`
- Quantized models: `unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit`
- GGUF models: `unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF`

## πŸš€ Quick Start

### 1. Install Dependencies

```bash
pip install -r requirements.txt
```

### 2. Start the Service

```bash
python backend_service.py
```

### 3. Test Multimodal Capabilities

```bash
python test_final.py
```

The service will start on **http://localhost:8001** with both text and vision models loaded.

## πŸ’‘ Usage Examples

### Text-Only Chat

```bash
curl -X POST http://localhost:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/DialoGPT-medium",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

### Image Analysis

```bash
curl -X POST http://localhost:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Salesforce/blip-image-captioning-base",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image",
            "url": "https://example.com/image.jpg"
          }
        ]
      }
    ]
  }'
```

### Multimodal (Image + Text)

```bash
curl -X POST http://localhost:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Salesforce/blip-image-captioning-base",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image",
            "url": "https://example.com/image.jpg"
          },
          {
            "type": "text",
            "text": "What do you see in this image?"
          }
        ]
      }
    ]
  }'
```

## πŸ”§ Technical Details

### Architecture

- **FastAPI** web framework
- **Transformers** pipeline for AI models
- **PyTorch** backend with GPU/MPS support
- **Pydantic** for request/response validation

### Models

- **Text**: microsoft/DialoGPT-medium
- **Vision**: Salesforce/blip-image-captioning-base

### API Endpoints

- `GET /` - Service information
- `GET /health` - Health check
- `GET /v1/models` - List available models
- `POST /v1/chat/completions` - Chat completions (text/multimodal)
- `GET /docs` - Interactive API documentation

## πŸš€ Deployment

### Environment Variables

```bash
# Optional: Custom models
export AI_MODEL="microsoft/DialoGPT-medium"
export VISION_MODEL="Salesforce/blip-image-captioning-base"
export HF_TOKEN="your_token_here"  # For private models
```

### Production Deployment

The service includes enhanced deployment capabilities:

- **Quantized Model Support**: Automatic handling of 4-bit and GGUF models
- **Fallback Mechanisms**: Multi-level fallback for constrained environments
- **Error Resilience**: Graceful degradation when quantization libraries unavailable

### Docker Deployment

```bash
# Build and run with Docker
docker build -t firstai .
docker run -p 8000:8000 firstai
```

### Testing Deployment

```bash
# Test quantization detection and fallbacks
python test_deployment_fallbacks.py

# Test health endpoint
curl http://localhost:8000/health
```

For comprehensive deployment guidance, see `DEPLOYMENT_ENHANCEMENTS.md`.

## πŸ§ͺ Testing

Run the comprehensive test suite:

```bash
python test_final.py
```

Test individual components:

```bash
python test_multimodal.py  # Basic multimodal tests
python test_pipeline.py    # Pipeline compatibility
```

## πŸ“¦ Dependencies

Key packages:

- `fastapi` - Web framework
- `transformers` - AI model pipelines
- `torch` - PyTorch backend
- `Pillow` - Image processing
- `accelerate` - Model acceleration
- `requests` - HTTP client

## 🎯 Integration Complete

This project successfully integrates:
βœ… **Transformers image-text-to-text pipeline**  
βœ… **OpenAI Vision API compatibility**  
βœ… **Multimodal message processing**  
βœ… **Production-ready FastAPI service**

See `MULTIMODAL_INTEGRATION_COMPLETE.md` for detailed integration documentation.

- PyTorch with MPS acceleration (Apple Silicon) AI Backend Service
  emoji: οΏ½
  colorFrom: yellow
  colorTo: purple
  sdk: fastapi
  sdk_version: 0.100.0
  app_file: backend_service.py
  pinned: false

---

# AI Backend Service πŸš€

**Status: βœ… CONVERSION COMPLETE!**

Successfully converted from a non-functioning Gradio HuggingFace app to a production-ready FastAPI backend service with OpenAI-compatible API endpoints.

## Quick Start

### 1. Setup Environment

```bash
# Activate the virtual environment
source gradio_env/bin/activate

# Install dependencies (already done)
pip install -r requirements.txt
```

### 2. Start the Backend Service

```bash
python backend_service.py --port 8000 --reload
```

### 3. Test the API

```bash
# Run comprehensive tests
python test_api.py

# Or try usage examples
python usage_examples.py
```

## API Endpoints

| Endpoint               | Method | Description                         |
| ---------------------- | ------ | ----------------------------------- |
| `/`                    | GET    | Service information                 |
| `/health`              | GET    | Health check                        |
| `/v1/models`           | GET    | List available models               |
| `/v1/chat/completions` | POST   | Chat completion (OpenAI compatible) |
| `/v1/completions`      | POST   | Text completion                     |

## Example Usage

### Chat Completion

```bash
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/DialoGPT-medium",
    "messages": [
      {"role": "user", "content": "Hello! How are you?"}
    ],
    "max_tokens": 150,
    "temperature": 0.7
  }'
```

### Streaming Chat

```bash
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/DialoGPT-medium",
    "messages": [
      {"role": "user", "content": "Tell me a joke"}
    ],
    "stream": true
  }'
```

## Files

- **`app.py`** - Original Gradio ChatInterface (still functional)
- **`backend_service.py`** - New FastAPI backend service ⭐
- **`test_api.py`** - Comprehensive API testing
- **`usage_examples.py`** - Simple usage examples
- **`requirements.txt`** - Updated dependencies
- **`CONVERSION_COMPLETE.md`** - Detailed conversion documentation

## Features

βœ… **OpenAI-Compatible API** - Drop-in replacement for OpenAI API  
βœ… **Async FastAPI** - High-performance async architecture  
βœ… **Streaming Support** - Real-time response streaming  
βœ… **Error Handling** - Robust error handling with fallbacks  
βœ… **Production Ready** - CORS, logging, health checks  
βœ… **Docker Ready** - Easy containerization  
βœ… **Auto-reload** - Development-friendly auto-reload  
βœ… **Type Safety** - Full type hints with Pydantic validation

## Service URLs

- **Backend Service**: http://localhost:8000
- **API Documentation**: http://localhost:8000/docs
- **OpenAPI Spec**: http://localhost:8000/openapi.json

## Model Information

- **Current Model**: `microsoft/DialoGPT-medium`
- **Type**: Conversational AI model
- **Provider**: HuggingFace Inference API
- **Capabilities**: Text generation, chat completion

## Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client Request    │───▢│   FastAPI Backend    │───▢│  HuggingFace API    β”‚
β”‚  (OpenAI format)    β”‚    β”‚  (backend_service)   β”‚    β”‚  (DialoGPT-medium)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                           β”‚   OpenAI Response    β”‚
                           β”‚   (JSON/Streaming)   β”‚
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Development

The service includes:

- **Auto-reload** for development
- **Comprehensive logging** for debugging
- **Type checking** for code quality
- **Test suite** for reliability
- **Error handling** for robustness

## Production Deployment

Ready for production with:

- **Environment variables** for configuration
- **Health check endpoints** for monitoring
- **CORS support** for web applications
- **Docker compatibility** for containerization
- **Structured logging** for observability

---

**πŸŽ‰ Conversion Status: COMPLETE!**  
Successfully transformed from broken Gradio app to production-ready AI backend service.

For detailed conversion documentation, see [`CONVERSION_COMPLETE.md`](CONVERSION_COMPLETE.md).

# Gemma 3n GGUF FastAPI Backend (Hugging Face Space)

This Space provides an OpenAI-compatible chat API for Gemma 3n GGUF models, powered by FastAPI.

**Note:** On Hugging Face Spaces, the backend runs in `DEMO_MODE` (no model loaded) for demonstration and endpoint testing. For real inference, run locally with a GGUF model and llama-cpp-python.

## Endpoints

- `/health` β€” Health check
- `/v1/chat/completions` β€” OpenAI-style chat completions (returns demo response)
- `/train/start` β€” Start a (demo) training job
- `/train/status/{job_id}` β€” Check training job status
- `/train/logs/{job_id}` β€” Get training logs

## Usage

1. **Clone this repo** or create a Hugging Face Space (type: FastAPI).
2. All dependencies are in `requirements.txt`.
3. The Space will start in demo mode (no model download required).

## Local Inference (with GGUF)

To run with a real model locally:

1. Download a Gemma 3n GGUF model (e.g. from https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF).
2. Set `AI_MODEL` to the local path or repo.
3. Unset `DEMO_MODE`.
4. Run:
   ```bash
   pip install -r requirements.txt
   uvicorn gemma_gguf_backend:app --host 0.0.0.0 --port 8000
   ```

## License
Apache 2.0