Spaces:
Sleeping
Sleeping
File size: 2,819 Bytes
01d5a5d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
# Custom Model Endpoint Guide with Ollama
## 1. Prerequisites: Ollama Setup
First, download and install Ollama from the official website:
π **Download Link**: [https://ollama.com/download](https://ollama.com/download)
π **Additional Resources**:
- Official Website: [https://ollama.com](https://ollama.com/)
- Model Library: [https://ollama.com/library](https://ollama.com/library)
- GitHub Repository: [https://github.com/ollama/ollama/](https://github.com/ollama/ollama)
---
## 2. Basic Ollama Commands
| Command | Description |
|------|------|
| `ollama pull model_name` | Download a model |
| `ollama serve` | Start the Ollama service |
| `ollama ps` | List running models |
| `ollama list` | List all downloaded models |
| `ollama rm model_name` | Remove a model |
| `ollama show model_name` | Show model details |
## 3. Using Ollama API for Custom Model
### OpenAI-Compatible API
#### Chat Request
```bash
curl http://127.0.0.1:11434/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "qwen2.5:0.5b",
"messages": [
{"role": "user", "content": "Why is the sky blue?"}
]
}'
```
#### Embedding Request
```bash
curl http://127.0.0.1:11434/v1/embeddings -d '{
"model": "snowflake-arctic-embed:110m",
"input": "Why is the sky blue?"
}'
```
More Details: [https://github.com/ollama/ollama/blob/main/docs/openai.md](https://github.com/ollama/ollama/blob/main/docs/openai.md)
## 4. Configuring Custom Embedding in Second Me
1. Start the Ollama service: `ollama serve`
2. Check your Ollama embedding model context length:
```bash
# Example: ollama show snowflake-arctic-embed:110m
$ ollama show snowflake-arctic-embed:110m
Model
architecture bert
parameters 108.89M
context length 512
embedding length 768
quantization F16
License
Apache License
Version 2.0, January 2004
```
3. Modify `EMBEDDING_MAX_TEXT_LENGTH` in `Second_Me/.env` to match your embedding model's context window. This prevents chunk length overflow and avoids server-side errors (500 Internal Server Error).
```bash
# Embedding configurations
EMBEDDING_MAX_TEXT_LENGTH=embedding_model_context_length
```
4. Configure Custom Embedding in Settings
```
Chat:
Model Name: qwen2.5:0.5b
API Key: ollama
API Endpoint: http://127.0.0.1:11434/v1
Embedding:
Model Name: snowflake-arctic-embed:110m
API Key: ollama
API Endpoint: http://127.0.0.1:11434/v1
```
**When running Second Me in Docker environments**, please replace `127.0.0.1` in API Endpoint with `host.docker.internal`:
```
Chat:
Model Name: qwen2.5:0.5b
API Key: ollama
API Endpoint: http://host.docker.internal:11434/v1
Embedding:
Model Name: snowflake-arctic-embed:110m
API Key: ollama
API Endpoint: http://host.docker.internal:11434/v1 |