DENNY
commited on
Commit
·
6d2ea02
1
Parent(s):
e6fe3f7
Add application file
Browse files- README.md +144 -10
- app.py +93 -0
- docker-compose.yml +23 -0
- dockerfile +27 -0
- requirements.txt +8 -0
README.md
CHANGED
@@ -1,10 +1,144 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Panduan Penggunaan Docker untuk Hugging Face Model API
|
2 |
+
|
3 |
+
## Struktur File
|
4 |
+
```
|
5 |
+
your-project/
|
6 |
+
├── Dockerfile
|
7 |
+
├── docker-compose.yml
|
8 |
+
├── requirements.txt
|
9 |
+
├── app.py
|
10 |
+
├── cache/ # Folder untuk cache model
|
11 |
+
└── README.md
|
12 |
+
```
|
13 |
+
|
14 |
+
## Cara Menjalankan
|
15 |
+
|
16 |
+
### 1. Build dan Jalankan dengan Docker Compose
|
17 |
+
```bash
|
18 |
+
# Clone atau buat direktori project
|
19 |
+
mkdir gema-model-api
|
20 |
+
cd gema-model-api
|
21 |
+
|
22 |
+
# Copy semua file yang telah dibuat
|
23 |
+
# Kemudian jalankan:
|
24 |
+
docker-compose up --build
|
25 |
+
```
|
26 |
+
|
27 |
+
### 2. Atau Build Manual
|
28 |
+
```bash
|
29 |
+
# Build image
|
30 |
+
docker build -t gema-model-api .
|
31 |
+
|
32 |
+
# Run container
|
33 |
+
docker run -p 8000:8000 -v $(pwd)/cache:/root/.cache/huggingface gema-model-api
|
34 |
+
```
|
35 |
+
|
36 |
+
## Testing API
|
37 |
+
|
38 |
+
### 1. Health Check
|
39 |
+
```bash
|
40 |
+
curl http://localhost:8000/health
|
41 |
+
```
|
42 |
+
|
43 |
+
### 2. Generate Text
|
44 |
+
```bash
|
45 |
+
curl -X POST "http://localhost:8000/generate" \
|
46 |
+
-H "Content-Type: application/json" \
|
47 |
+
-d '{
|
48 |
+
"inputs": "Apa kabar dunia teknologi hari ini?"
|
49 |
+
}'
|
50 |
+
```
|
51 |
+
|
52 |
+
### 3. Generate dengan Parameter Custom
|
53 |
+
```bash
|
54 |
+
curl -X POST "http://localhost:8000/generate" \
|
55 |
+
-H "Content-Type: application/json" \
|
56 |
+
-d '{
|
57 |
+
"inputs": "Jelaskan tentang kecerdasan buatan",
|
58 |
+
"max_length": 200,
|
59 |
+
"temperature": 0.8,
|
60 |
+
"top_p": 0.95
|
61 |
+
}'
|
62 |
+
```
|
63 |
+
|
64 |
+
## Mengakses dari Aplikasi Lain
|
65 |
+
|
66 |
+
### Python
|
67 |
+
```python
|
68 |
+
import requests
|
69 |
+
|
70 |
+
url = "http://localhost:8000/generate"
|
71 |
+
data = {
|
72 |
+
"inputs": "CONTOH INPUT USER"
|
73 |
+
}
|
74 |
+
|
75 |
+
response = requests.post(url, json=data)
|
76 |
+
result = response.json()
|
77 |
+
print(result["generated_text"])
|
78 |
+
```
|
79 |
+
|
80 |
+
### JavaScript/Node.js
|
81 |
+
```javascript
|
82 |
+
const response = await fetch('http://localhost:8000/generate', {
|
83 |
+
method: 'POST',
|
84 |
+
headers: {
|
85 |
+
'Content-Type': 'application/json',
|
86 |
+
},
|
87 |
+
body: JSON.stringify({
|
88 |
+
inputs: 'CONTOH INPUT USER'
|
89 |
+
})
|
90 |
+
});
|
91 |
+
|
92 |
+
const result = await response.json();
|
93 |
+
console.log(result.generated_text);
|
94 |
+
```
|
95 |
+
|
96 |
+
## Dokumentasi API
|
97 |
+
Setelah menjalankan container, buka browser dan akses:
|
98 |
+
- API Docs: `http://localhost:8000/docs`
|
99 |
+
- ReDoc: `http://localhost:8000/redoc`
|
100 |
+
|
101 |
+
## Tips Optimasi
|
102 |
+
|
103 |
+
### 1. Untuk GPU Support
|
104 |
+
Jika Anda memiliki GPU NVIDIA, update `app.py`:
|
105 |
+
```python
|
106 |
+
# Ubah gpu_layers dari 0 ke jumlah yang sesuai
|
107 |
+
gpu_layers=50 # Atau sesuai kemampuan GPU Anda
|
108 |
+
```
|
109 |
+
|
110 |
+
Dan update `docker-compose.yml`:
|
111 |
+
```yaml
|
112 |
+
services:
|
113 |
+
gema-model-api:
|
114 |
+
# ... konfigurasi lainnya
|
115 |
+
runtime: nvidia # Untuk GPU support
|
116 |
+
environment:
|
117 |
+
- NVIDIA_VISIBLE_DEVICES=all
|
118 |
+
```
|
119 |
+
|
120 |
+
### 2. Untuk Production
|
121 |
+
- Gunakan reverse proxy (nginx)
|
122 |
+
- Implement authentication
|
123 |
+
- Add rate limiting
|
124 |
+
- Set up monitoring dan logging
|
125 |
+
- Use environment variables untuk konfigurasi
|
126 |
+
|
127 |
+
### 3. Memory Management
|
128 |
+
Model ini memerlukan RAM yang cukup. Sesuaikan memory limits di docker-compose.yml berdasarkan spesifikasi server Anda.
|
129 |
+
|
130 |
+
## Troubleshooting
|
131 |
+
|
132 |
+
### Model Loading Issues
|
133 |
+
- Pastikan koneksi internet stabil saat pertama kali menjalankan
|
134 |
+
- Model akan diunduh otomatis dan disimpan di cache
|
135 |
+
- Jika gagal, coba hapus folder cache dan jalankan ulang
|
136 |
+
|
137 |
+
### Memory Issues
|
138 |
+
- Kurangi `context_length` di app.py
|
139 |
+
- Sesuaikan memory limits di docker-compose.yml
|
140 |
+
- Gunakan swap file jika diperlukan
|
141 |
+
|
142 |
+
### Port Conflicts
|
143 |
+
- Ubah port di docker-compose.yml jika port 8000 sudah digunakan
|
144 |
+
- Contoh: `"8080:8000"` untuk menggunakan port 8080
|
app.py
ADDED
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from fastapi import FastAPI, HTTPException
|
2 |
+
from pydantic import BaseModel
|
3 |
+
from ctransformers import AutoModelForCausalLM
|
4 |
+
import os
|
5 |
+
import uvicorn
|
6 |
+
from typing import Optional, List
|
7 |
+
import logging
|
8 |
+
|
9 |
+
# Set up logging
|
10 |
+
logging.basicConfig(level=logging.INFO)
|
11 |
+
logger = logging.getLogger(__name__)
|
12 |
+
|
13 |
+
app = FastAPI(title="Gema 4B Model API", version="1.0.0")
|
14 |
+
|
15 |
+
# Request model - fleksibel untuk menerima semua parameter
|
16 |
+
class TextRequest(BaseModel):
|
17 |
+
inputs: str
|
18 |
+
system_prompt: Optional[str] = None
|
19 |
+
max_tokens: Optional[int] = 10
|
20 |
+
temperature: Optional[float] = 0.7
|
21 |
+
top_k: Optional[int] = 50
|
22 |
+
top_p: Optional[float] = 0.9
|
23 |
+
repeat_penalty: Optional[float] = 2
|
24 |
+
stop: Optional[List[str]] = None
|
25 |
+
|
26 |
+
# Response model
|
27 |
+
class TextResponse(BaseModel):
|
28 |
+
generated_text: str
|
29 |
+
|
30 |
+
# Global model variable
|
31 |
+
model = None
|
32 |
+
|
33 |
+
@app.on_event("startup")
|
34 |
+
async def load_model():
|
35 |
+
global model
|
36 |
+
try:
|
37 |
+
logger.info("Loading model...")
|
38 |
+
model = AutoModelForCausalLM.from_pretrained(
|
39 |
+
"Dnfs/gema-4b-indra10k-model1-Q4_K_M-GGUF",
|
40 |
+
model_file="gema-4b-indra10k-model1-q4_k_m.gguf",
|
41 |
+
model_type="llama",
|
42 |
+
gpu_layers=0, # Set to appropriate number if using GPU
|
43 |
+
context_length=2048,
|
44 |
+
threads=os.cpu_count()
|
45 |
+
)
|
46 |
+
logger.info("Model loaded successfully!")
|
47 |
+
except Exception as e:
|
48 |
+
logger.error(f"Failed to load model: {e}")
|
49 |
+
raise e
|
50 |
+
|
51 |
+
@app.post("/generate", response_model=TextResponse)
|
52 |
+
async def generate_text(request: TextRequest):
|
53 |
+
if model is None:
|
54 |
+
raise HTTPException(status_code=500, detail="Model not loaded")
|
55 |
+
|
56 |
+
try:
|
57 |
+
# Buat prompt - gunakan system_prompt jika ada, atau langsung input user
|
58 |
+
if request.system_prompt:
|
59 |
+
full_prompt = f"{request.system_prompt}\n\nUser: {request.inputs}\nAssistant:"
|
60 |
+
else:
|
61 |
+
full_prompt = request.inputs
|
62 |
+
|
63 |
+
# Generate text dengan parameter dari request
|
64 |
+
generated_text = model(
|
65 |
+
full_prompt,
|
66 |
+
max_new_tokens=request.max_tokens,
|
67 |
+
temperature=request.temperature,
|
68 |
+
top_p=request.top_p,
|
69 |
+
top_k=request.top_k,
|
70 |
+
repetition_penalty=request.repeat_penalty,
|
71 |
+
stop=request.stop or []
|
72 |
+
)
|
73 |
+
|
74 |
+
# Bersihkan response dari system prompt jika ada
|
75 |
+
if "Assistant:" in generated_text:
|
76 |
+
generated_text = generated_text.split("Assistant:")[-1].strip()
|
77 |
+
|
78 |
+
return TextResponse(generated_text=generated_text)
|
79 |
+
|
80 |
+
except Exception as e:
|
81 |
+
logger.error(f"Generation error: {e}")
|
82 |
+
raise HTTPException(status_code=500, detail=f"Generation failed: {str(e)}")
|
83 |
+
|
84 |
+
@app.get("/health")
|
85 |
+
async def health_check():
|
86 |
+
return {"status": "healthy", "model_loaded": model is not None}
|
87 |
+
|
88 |
+
@app.get("/")
|
89 |
+
async def root():
|
90 |
+
return {"message": "Gema 4B Model API", "docs": "/docs"}
|
91 |
+
|
92 |
+
if __name__ == "__main__":
|
93 |
+
uvicorn.run(app, host="0.0.0.0", port=8000)
|
docker-compose.yml
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
version: '3.8'
|
2 |
+
|
3 |
+
services:
|
4 |
+
gema-model-api:
|
5 |
+
build: .
|
6 |
+
ports:
|
7 |
+
- "8000:8000"
|
8 |
+
environment:
|
9 |
+
- PYTHONUNBUFFERED=1
|
10 |
+
volumes:
|
11 |
+
- ./cache:/root/.cache/huggingface # Cache model downloads
|
12 |
+
restart: unless-stopped
|
13 |
+
healthcheck:
|
14 |
+
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
|
15 |
+
interval: 30s
|
16 |
+
timeout: 10s
|
17 |
+
retries: 3
|
18 |
+
deploy:
|
19 |
+
resources:
|
20 |
+
limits:
|
21 |
+
memory: 8G # Adjust based on your system
|
22 |
+
reservations:
|
23 |
+
memory: 4G
|
dockerfile
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Dockerfile
|
2 |
+
FROM python:3.10-slim
|
3 |
+
|
4 |
+
# Set working directory
|
5 |
+
WORKDIR /app
|
6 |
+
|
7 |
+
# Install system dependencies
|
8 |
+
RUN apt-get update && apt-get install -y \
|
9 |
+
git \
|
10 |
+
curl \
|
11 |
+
build-essential \
|
12 |
+
&& rm -rf /var/lib/apt/lists/*
|
13 |
+
|
14 |
+
# Copy requirements first (for better caching)
|
15 |
+
COPY requirements.txt .
|
16 |
+
|
17 |
+
# Install Python dependencies
|
18 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
19 |
+
|
20 |
+
# Copy application files
|
21 |
+
COPY . .
|
22 |
+
|
23 |
+
# Expose port
|
24 |
+
EXPOSE 8000
|
25 |
+
|
26 |
+
# Command to run the application
|
27 |
+
CMD ["python", "app.py"]
|
requirements.txt
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
transformers==4.36.0
|
2 |
+
torch==2.1.0
|
3 |
+
fastapi==0.104.1
|
4 |
+
uvicorn==0.24.0
|
5 |
+
huggingface-hub==0.19.4
|
6 |
+
pydantic==2.5.0
|
7 |
+
accelerate==0.25.0
|
8 |
+
ctransformers==0.2.27
|