Spaces:
Paused
Paused
Revamp llama.cpp docs (#1214)
Browse files* Revamp llama.cpp docs
* format
* update readme
* update index page
* update readme
* bertter fomratting
* Update README.md
Co-authored-by: Victor Muštar <[email protected]>
* Update README.md
Co-authored-by: Victor Muštar <[email protected]>
* fix hashlink
* document llama hf args
* format
---------
Co-authored-by: Victor Muštar <[email protected]>
- README.md +80 -13
- docs/source/configuration/models/providers/llamacpp.md +31 -20
- docs/source/index.md +66 -0
README.md
CHANGED
|
@@ -20,15 +20,79 @@ load_balancing_strategy: random
|
|
| 20 |
|
| 21 |
A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the [HuggingChat app on hf.co/chat](https://huggingface.co/chat).
|
| 22 |
|
| 23 |
-
0. [
|
| 24 |
-
1. [Setup](#setup)
|
| 25 |
-
2. [
|
| 26 |
-
3. [
|
| 27 |
-
4. [
|
| 28 |
-
5. [
|
| 29 |
-
6. [
|
| 30 |
-
7. [
|
| 31 |
-
8. [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
## No Setup Deploy
|
| 34 |
|
|
@@ -415,11 +479,14 @@ MODELS=`[{
|
|
| 415 |
|
| 416 |
chat-ui also supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.
|
| 417 |
|
| 418 |
-
If you want to run
|
| 419 |
|
| 420 |
-
|
| 421 |
-
|
| 422 |
-
|
|
|
|
|
|
|
|
|
|
| 423 |
|
| 424 |
```env
|
| 425 |
MODELS=`[
|
|
|
|
| 20 |
|
| 21 |
A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the [HuggingChat app on hf.co/chat](https://huggingface.co/chat).
|
| 22 |
|
| 23 |
+
0. [Quickstart](#quickstart)
|
| 24 |
+
1. [No Setup Deploy](#no-setup-deploy)
|
| 25 |
+
2. [Setup](#setup)
|
| 26 |
+
3. [Launch](#launch)
|
| 27 |
+
4. [Web Search](#web-search)
|
| 28 |
+
5. [Text Embedding Models](#text-embedding-models)
|
| 29 |
+
6. [Extra parameters](#extra-parameters)
|
| 30 |
+
7. [Common issues](#common-issues)
|
| 31 |
+
8. [Deploying to a HF Space](#deploying-to-a-hf-space)
|
| 32 |
+
9. [Building](#building)
|
| 33 |
+
|
| 34 |
+
## Quickstart
|
| 35 |
+
|
| 36 |
+
You can quickly start a locally running chat-ui & LLM text-generation server thanks to chat-ui's [llama.cpp server support](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
|
| 37 |
+
|
| 38 |
+
**Step 1 (Start llama.cpp server):**
|
| 39 |
+
|
| 40 |
+
```bash
|
| 41 |
+
# install llama.cpp
|
| 42 |
+
brew install llama.cpp
|
| 43 |
+
# start llama.cpp server (using hf.co/microsoft/Phi-3-mini-4k-instruct-gguf as an example)
|
| 44 |
+
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
A local LLaMA.cpp HTTP Server will start on `http://localhost:8080`. Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
|
| 48 |
+
|
| 49 |
+
**Step 2 (tell chat-ui to use local llama.cpp server):**
|
| 50 |
+
|
| 51 |
+
Add the following to your `.env.local`:
|
| 52 |
+
|
| 53 |
+
```ini
|
| 54 |
+
MODELS=`[
|
| 55 |
+
{
|
| 56 |
+
"name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
|
| 57 |
+
"tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
|
| 58 |
+
"preprompt": "",
|
| 59 |
+
"chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
|
| 60 |
+
"parameters": {
|
| 61 |
+
"stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
|
| 62 |
+
"temperature": 0.7,
|
| 63 |
+
"max_new_tokens": 1024,
|
| 64 |
+
"truncate": 3071
|
| 65 |
+
},
|
| 66 |
+
"endpoints": [{
|
| 67 |
+
"type" : "llamacpp",
|
| 68 |
+
"baseURL": "http://localhost:8080"
|
| 69 |
+
}],
|
| 70 |
+
},
|
| 71 |
+
]`
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
|
| 75 |
+
|
| 76 |
+
**Step 3 (make sure you have MongoDb running locally):**
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
Read more [here](#database).
|
| 83 |
+
|
| 84 |
+
**Step 4 (start chat-ui):**
|
| 85 |
+
|
| 86 |
+
```bash
|
| 87 |
+
git clone https://github.com/huggingface/chat-ui
|
| 88 |
+
cd chat-ui
|
| 89 |
+
npm install
|
| 90 |
+
npm run dev -- --open
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
Read more [here](#launch).
|
| 94 |
+
|
| 95 |
+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
|
| 96 |
|
| 97 |
## No Setup Deploy
|
| 98 |
|
|
|
|
| 479 |
|
| 480 |
chat-ui also supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.
|
| 481 |
|
| 482 |
+
If you want to run Chat UI with llama.cpp, you can do the following, using [microsoft/Phi-3-mini-4k-instruct-gguf](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) as an example model:
|
| 483 |
|
| 484 |
+
```bash
|
| 485 |
+
# install llama.cpp
|
| 486 |
+
brew install llama.cpp
|
| 487 |
+
# start llama.cpp server
|
| 488 |
+
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
|
| 489 |
+
```
|
| 490 |
|
| 491 |
```env
|
| 492 |
MODELS=`[
|
docs/source/configuration/models/providers/llamacpp.md
CHANGED
|
@@ -7,32 +7,43 @@
|
|
| 7 |
|
| 8 |
Chat UI supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.
|
| 9 |
|
| 10 |
-
If you want to run Chat UI with llama.cpp, you can do the following, using
|
| 11 |
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
```ini
|
| 17 |
MODELS=`[
|
| 18 |
{
|
| 19 |
-
"name": "Local
|
| 20 |
-
"
|
|
|
|
|
|
|
| 21 |
"parameters": {
|
| 22 |
-
"
|
| 23 |
-
"
|
| 24 |
-
"
|
| 25 |
-
"
|
| 26 |
-
"truncate": 1000,
|
| 27 |
-
"max_new_tokens": 2048,
|
| 28 |
-
"stop": ["</s>"]
|
| 29 |
},
|
| 30 |
-
"endpoints": [
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
]
|
| 36 |
-
}
|
| 37 |
]`
|
| 38 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
|
| 8 |
Chat UI supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.
|
| 9 |
|
| 10 |
+
If you want to run Chat UI with llama.cpp, you can do the following, using [microsoft/Phi-3-mini-4k-instruct-gguf](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) as an example model:
|
| 11 |
|
| 12 |
+
```bash
|
| 13 |
+
# install llama.cpp
|
| 14 |
+
brew install llama.cpp
|
| 15 |
+
# start llama.cpp server
|
| 16 |
+
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
_note: you can swap the `hf-repo` and `hf-file` with your fav GGUF on the [Hub](https://huggingface.co/models?library=gguf). For example: `--hf-repo TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF` for [this repo](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF) & `--hf-file tinyllama-1.1b-chat-v1.0.Q4_0.gguf` for [this file](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blob/main/tinyllama-1.1b-chat-v1.0.Q4_0.gguf)._
|
| 20 |
+
|
| 21 |
+
A local LLaMA.cpp HTTP Server will start on `http://localhost:8080` (to change the port or any other default options, please find [LLaMA.cpp HTTP Server readme](https://github.com/ggerganov/llama.cpp/tree/master/examples/server)).
|
| 22 |
+
|
| 23 |
+
Add the following to your `.env.local`:
|
| 24 |
|
| 25 |
```ini
|
| 26 |
MODELS=`[
|
| 27 |
{
|
| 28 |
+
"name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
|
| 29 |
+
"tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
|
| 30 |
+
"preprompt": "",
|
| 31 |
+
"chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
|
| 32 |
"parameters": {
|
| 33 |
+
"stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
|
| 34 |
+
"temperature": 0.7,
|
| 35 |
+
"max_new_tokens": 1024,
|
| 36 |
+
"truncate": 3071
|
|
|
|
|
|
|
|
|
|
| 37 |
},
|
| 38 |
+
"endpoints": [{
|
| 39 |
+
"type" : "llamacpp",
|
| 40 |
+
"baseURL": "http://localhost:8080"
|
| 41 |
+
}],
|
| 42 |
+
},
|
|
|
|
|
|
|
| 43 |
]`
|
| 44 |
```
|
| 45 |
+
|
| 46 |
+
<div class="flex justify-center">
|
| 47 |
+
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-light.png" height="auto"/>
|
| 48 |
+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
|
| 49 |
+
</div>
|
docs/source/index.md
CHANGED
|
@@ -9,3 +9,69 @@ Open source chat interface with support for tools, web search, multimodal and ma
|
|
| 9 |
🐙 **Multimodal**: Accepts image file uploads on supported providers
|
| 10 |
|
| 11 |
👤 **OpenID**: Optionally setup OpenID for user authentication
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
🐙 **Multimodal**: Accepts image file uploads on supported providers
|
| 10 |
|
| 11 |
👤 **OpenID**: Optionally setup OpenID for user authentication
|
| 12 |
+
|
| 13 |
+
## Quickstart Locally
|
| 14 |
+
|
| 15 |
+
You can quickly have a locally running chat-ui & LLM text-generation server thanks to chat-ui's [llama.cpp server support](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
|
| 16 |
+
|
| 17 |
+
**Step 1 (Start llama.cpp server):**
|
| 18 |
+
|
| 19 |
+
```bash
|
| 20 |
+
# install llama.cpp
|
| 21 |
+
brew install llama.cpp
|
| 22 |
+
# start llama.cpp server (using hf.co/microsoft/Phi-3-mini-4k-instruct-gguf as an example)
|
| 23 |
+
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
A local LLaMA.cpp HTTP Server will start on `http://localhost:8080`. Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
|
| 27 |
+
|
| 28 |
+
**Step 2 (tell chat-ui to use local llama.cpp server):**
|
| 29 |
+
|
| 30 |
+
Add the following to your `.env.local`:
|
| 31 |
+
|
| 32 |
+
```ini
|
| 33 |
+
MODELS=`[
|
| 34 |
+
{
|
| 35 |
+
"name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
|
| 36 |
+
"tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
|
| 37 |
+
"preprompt": "",
|
| 38 |
+
"chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
|
| 39 |
+
"parameters": {
|
| 40 |
+
"stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
|
| 41 |
+
"temperature": 0.7,
|
| 42 |
+
"max_new_tokens": 1024,
|
| 43 |
+
"truncate": 3071
|
| 44 |
+
},
|
| 45 |
+
"endpoints": [{
|
| 46 |
+
"type" : "llamacpp",
|
| 47 |
+
"baseURL": "http://localhost:8080"
|
| 48 |
+
}],
|
| 49 |
+
},
|
| 50 |
+
]`
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
|
| 54 |
+
|
| 55 |
+
**Step 3 (make sure you have MongoDb running locally):**
|
| 56 |
+
|
| 57 |
+
```bash
|
| 58 |
+
docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
Read more [here](https://github.com/huggingface/chat-ui?tab=Readme-ov-file#database).
|
| 62 |
+
|
| 63 |
+
**Step 4 (start chat-ui):**
|
| 64 |
+
|
| 65 |
+
```bash
|
| 66 |
+
git clone https://github.com/huggingface/chat-ui
|
| 67 |
+
cd chat-ui
|
| 68 |
+
npm install
|
| 69 |
+
npm run dev -- --open
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
read more [here](https://github.com/huggingface/chat-ui?tab=readme-ov-file#launch).
|
| 73 |
+
|
| 74 |
+
<div class="flex justify-center">
|
| 75 |
+
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-light.png" height="auto"/>
|
| 76 |
+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
|
| 77 |
+
</div>
|