Llama.cpp

Feature	Available
Tools	No
Multimodal	No

Chat UI supports the llama.cpp API server directly without the need for an adapter. You can do this using the llamacpp endpoint type.

If you want to run Chat UI with llama.cpp, you can do the following, using Zephyr as an example model:

Get the weights from the hub
Run the server with the following command: ./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3
Add the following to your .env.local:

MODELS=`[
  {
    "name": "Local Zephyr",
    "chatPromptTemplate": "<|system|>\n{{preprompt}}</s>\n{{#each messages}}{{#ifUser}}<|user|>\n{{content}}</s>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}</s>\n{{/ifAssistant}}{{/each}}",
    "parameters": {
      "temperature": 0.1,
      "top_p": 0.95,
      "repetition_penalty": 1.2,
      "top_k": 50,
      "truncate": 1000,
      "max_new_tokens": 2048,
      "stop": ["</s>"]
    },
    "endpoints": [
      {
        "url": "http://127.0.0.1:8080",
        "type": "llamacpp"
      }
    ]
  }
]`