DENNY commited on
Commit
6d2ea02
·
1 Parent(s): e6fe3f7

Add application file

Browse files
Files changed (5) hide show
  1. README.md +144 -10
  2. app.py +93 -0
  3. docker-compose.yml +23 -0
  4. dockerfile +27 -0
  5. requirements.txt +8 -0
README.md CHANGED
@@ -1,10 +1,144 @@
1
- ---
2
- title: Llm Apiku
3
- emoji: 🏆
4
- colorFrom: pink
5
- colorTo: gray
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Panduan Penggunaan Docker untuk Hugging Face Model API
2
+
3
+ ## Struktur File
4
+ ```
5
+ your-project/
6
+ ├── Dockerfile
7
+ ├── docker-compose.yml
8
+ ├── requirements.txt
9
+ ├── app.py
10
+ ├── cache/ # Folder untuk cache model
11
+ └── README.md
12
+ ```
13
+
14
+ ## Cara Menjalankan
15
+
16
+ ### 1. Build dan Jalankan dengan Docker Compose
17
+ ```bash
18
+ # Clone atau buat direktori project
19
+ mkdir gema-model-api
20
+ cd gema-model-api
21
+
22
+ # Copy semua file yang telah dibuat
23
+ # Kemudian jalankan:
24
+ docker-compose up --build
25
+ ```
26
+
27
+ ### 2. Atau Build Manual
28
+ ```bash
29
+ # Build image
30
+ docker build -t gema-model-api .
31
+
32
+ # Run container
33
+ docker run -p 8000:8000 -v $(pwd)/cache:/root/.cache/huggingface gema-model-api
34
+ ```
35
+
36
+ ## Testing API
37
+
38
+ ### 1. Health Check
39
+ ```bash
40
+ curl http://localhost:8000/health
41
+ ```
42
+
43
+ ### 2. Generate Text
44
+ ```bash
45
+ curl -X POST "http://localhost:8000/generate" \
46
+ -H "Content-Type: application/json" \
47
+ -d '{
48
+ "inputs": "Apa kabar dunia teknologi hari ini?"
49
+ }'
50
+ ```
51
+
52
+ ### 3. Generate dengan Parameter Custom
53
+ ```bash
54
+ curl -X POST "http://localhost:8000/generate" \
55
+ -H "Content-Type: application/json" \
56
+ -d '{
57
+ "inputs": "Jelaskan tentang kecerdasan buatan",
58
+ "max_length": 200,
59
+ "temperature": 0.8,
60
+ "top_p": 0.95
61
+ }'
62
+ ```
63
+
64
+ ## Mengakses dari Aplikasi Lain
65
+
66
+ ### Python
67
+ ```python
68
+ import requests
69
+
70
+ url = "http://localhost:8000/generate"
71
+ data = {
72
+ "inputs": "CONTOH INPUT USER"
73
+ }
74
+
75
+ response = requests.post(url, json=data)
76
+ result = response.json()
77
+ print(result["generated_text"])
78
+ ```
79
+
80
+ ### JavaScript/Node.js
81
+ ```javascript
82
+ const response = await fetch('http://localhost:8000/generate', {
83
+ method: 'POST',
84
+ headers: {
85
+ 'Content-Type': 'application/json',
86
+ },
87
+ body: JSON.stringify({
88
+ inputs: 'CONTOH INPUT USER'
89
+ })
90
+ });
91
+
92
+ const result = await response.json();
93
+ console.log(result.generated_text);
94
+ ```
95
+
96
+ ## Dokumentasi API
97
+ Setelah menjalankan container, buka browser dan akses:
98
+ - API Docs: `http://localhost:8000/docs`
99
+ - ReDoc: `http://localhost:8000/redoc`
100
+
101
+ ## Tips Optimasi
102
+
103
+ ### 1. Untuk GPU Support
104
+ Jika Anda memiliki GPU NVIDIA, update `app.py`:
105
+ ```python
106
+ # Ubah gpu_layers dari 0 ke jumlah yang sesuai
107
+ gpu_layers=50 # Atau sesuai kemampuan GPU Anda
108
+ ```
109
+
110
+ Dan update `docker-compose.yml`:
111
+ ```yaml
112
+ services:
113
+ gema-model-api:
114
+ # ... konfigurasi lainnya
115
+ runtime: nvidia # Untuk GPU support
116
+ environment:
117
+ - NVIDIA_VISIBLE_DEVICES=all
118
+ ```
119
+
120
+ ### 2. Untuk Production
121
+ - Gunakan reverse proxy (nginx)
122
+ - Implement authentication
123
+ - Add rate limiting
124
+ - Set up monitoring dan logging
125
+ - Use environment variables untuk konfigurasi
126
+
127
+ ### 3. Memory Management
128
+ Model ini memerlukan RAM yang cukup. Sesuaikan memory limits di docker-compose.yml berdasarkan spesifikasi server Anda.
129
+
130
+ ## Troubleshooting
131
+
132
+ ### Model Loading Issues
133
+ - Pastikan koneksi internet stabil saat pertama kali menjalankan
134
+ - Model akan diunduh otomatis dan disimpan di cache
135
+ - Jika gagal, coba hapus folder cache dan jalankan ulang
136
+
137
+ ### Memory Issues
138
+ - Kurangi `context_length` di app.py
139
+ - Sesuaikan memory limits di docker-compose.yml
140
+ - Gunakan swap file jika diperlukan
141
+
142
+ ### Port Conflicts
143
+ - Ubah port di docker-compose.yml jika port 8000 sudah digunakan
144
+ - Contoh: `"8080:8000"` untuk menggunakan port 8080
app.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, HTTPException
2
+ from pydantic import BaseModel
3
+ from ctransformers import AutoModelForCausalLM
4
+ import os
5
+ import uvicorn
6
+ from typing import Optional, List
7
+ import logging
8
+
9
+ # Set up logging
10
+ logging.basicConfig(level=logging.INFO)
11
+ logger = logging.getLogger(__name__)
12
+
13
+ app = FastAPI(title="Gema 4B Model API", version="1.0.0")
14
+
15
+ # Request model - fleksibel untuk menerima semua parameter
16
+ class TextRequest(BaseModel):
17
+ inputs: str
18
+ system_prompt: Optional[str] = None
19
+ max_tokens: Optional[int] = 10
20
+ temperature: Optional[float] = 0.7
21
+ top_k: Optional[int] = 50
22
+ top_p: Optional[float] = 0.9
23
+ repeat_penalty: Optional[float] = 2
24
+ stop: Optional[List[str]] = None
25
+
26
+ # Response model
27
+ class TextResponse(BaseModel):
28
+ generated_text: str
29
+
30
+ # Global model variable
31
+ model = None
32
+
33
+ @app.on_event("startup")
34
+ async def load_model():
35
+ global model
36
+ try:
37
+ logger.info("Loading model...")
38
+ model = AutoModelForCausalLM.from_pretrained(
39
+ "Dnfs/gema-4b-indra10k-model1-Q4_K_M-GGUF",
40
+ model_file="gema-4b-indra10k-model1-q4_k_m.gguf",
41
+ model_type="llama",
42
+ gpu_layers=0, # Set to appropriate number if using GPU
43
+ context_length=2048,
44
+ threads=os.cpu_count()
45
+ )
46
+ logger.info("Model loaded successfully!")
47
+ except Exception as e:
48
+ logger.error(f"Failed to load model: {e}")
49
+ raise e
50
+
51
+ @app.post("/generate", response_model=TextResponse)
52
+ async def generate_text(request: TextRequest):
53
+ if model is None:
54
+ raise HTTPException(status_code=500, detail="Model not loaded")
55
+
56
+ try:
57
+ # Buat prompt - gunakan system_prompt jika ada, atau langsung input user
58
+ if request.system_prompt:
59
+ full_prompt = f"{request.system_prompt}\n\nUser: {request.inputs}\nAssistant:"
60
+ else:
61
+ full_prompt = request.inputs
62
+
63
+ # Generate text dengan parameter dari request
64
+ generated_text = model(
65
+ full_prompt,
66
+ max_new_tokens=request.max_tokens,
67
+ temperature=request.temperature,
68
+ top_p=request.top_p,
69
+ top_k=request.top_k,
70
+ repetition_penalty=request.repeat_penalty,
71
+ stop=request.stop or []
72
+ )
73
+
74
+ # Bersihkan response dari system prompt jika ada
75
+ if "Assistant:" in generated_text:
76
+ generated_text = generated_text.split("Assistant:")[-1].strip()
77
+
78
+ return TextResponse(generated_text=generated_text)
79
+
80
+ except Exception as e:
81
+ logger.error(f"Generation error: {e}")
82
+ raise HTTPException(status_code=500, detail=f"Generation failed: {str(e)}")
83
+
84
+ @app.get("/health")
85
+ async def health_check():
86
+ return {"status": "healthy", "model_loaded": model is not None}
87
+
88
+ @app.get("/")
89
+ async def root():
90
+ return {"message": "Gema 4B Model API", "docs": "/docs"}
91
+
92
+ if __name__ == "__main__":
93
+ uvicorn.run(app, host="0.0.0.0", port=8000)
docker-compose.yml ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: '3.8'
2
+
3
+ services:
4
+ gema-model-api:
5
+ build: .
6
+ ports:
7
+ - "8000:8000"
8
+ environment:
9
+ - PYTHONUNBUFFERED=1
10
+ volumes:
11
+ - ./cache:/root/.cache/huggingface # Cache model downloads
12
+ restart: unless-stopped
13
+ healthcheck:
14
+ test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
15
+ interval: 30s
16
+ timeout: 10s
17
+ retries: 3
18
+ deploy:
19
+ resources:
20
+ limits:
21
+ memory: 8G # Adjust based on your system
22
+ reservations:
23
+ memory: 4G
dockerfile ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Dockerfile
2
+ FROM python:3.10-slim
3
+
4
+ # Set working directory
5
+ WORKDIR /app
6
+
7
+ # Install system dependencies
8
+ RUN apt-get update && apt-get install -y \
9
+ git \
10
+ curl \
11
+ build-essential \
12
+ && rm -rf /var/lib/apt/lists/*
13
+
14
+ # Copy requirements first (for better caching)
15
+ COPY requirements.txt .
16
+
17
+ # Install Python dependencies
18
+ RUN pip install --no-cache-dir -r requirements.txt
19
+
20
+ # Copy application files
21
+ COPY . .
22
+
23
+ # Expose port
24
+ EXPOSE 8000
25
+
26
+ # Command to run the application
27
+ CMD ["python", "app.py"]
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ transformers==4.36.0
2
+ torch==2.1.0
3
+ fastapi==0.104.1
4
+ uvicorn==0.24.0
5
+ huggingface-hub==0.19.4
6
+ pydantic==2.5.0
7
+ accelerate==0.25.0
8
+ ctransformers==0.2.27