Spaces:
Running
Running
Llama.cpp
Feature | Available |
---|---|
Tools | No |
Multimodal | No |
Chat UI supports the llama.cpp API server directly without the need for an adapter. You can do this using the llamacpp
endpoint type.
If you want to run Chat UI with llama.cpp, you can do the following, using Zephyr as an example model:
- Get the weights from the hub
- Run the server with the following command:
./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3
- Add the following to your
.env.local
:
MODELS=`[
{
"name": "Local Zephyr",
"chatPromptTemplate": "<|system|>\n{{preprompt}}</s>\n{{#each messages}}{{#ifUser}}<|user|>\n{{content}}</s>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}</s>\n{{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 1000,
"max_new_tokens": 2048,
"stop": ["</s>"]
},
"endpoints": [
{
"url": "http://127.0.0.1:8080",
"type": "llamacpp"
}
]
}
]`