Liam Dyer
Hosted Documentation (#1155)
b88a126 unverified
|
raw
history blame
1.25 kB

Llama.cpp

Feature Available
Tools No
Multimodal No

Chat UI supports the llama.cpp API server directly without the need for an adapter. You can do this using the llamacpp endpoint type.

If you want to run Chat UI with llama.cpp, you can do the following, using Zephyr as an example model:

  1. Get the weights from the hub
  2. Run the server with the following command: ./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3
  3. Add the following to your .env.local:
MODELS=`[
  {
    "name": "Local Zephyr",
    "chatPromptTemplate": "<|system|>\n{{preprompt}}</s>\n{{#each messages}}{{#ifUser}}<|user|>\n{{content}}</s>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}</s>\n{{/ifAssistant}}{{/each}}",
    "parameters": {
      "temperature": 0.1,
      "top_p": 0.95,
      "repetition_penalty": 1.2,
      "top_k": 50,
      "truncate": 1000,
      "max_new_tokens": 2048,
      "stop": ["</s>"]
    },
    "endpoints": [
      {
        "url": "http://127.0.0.1:8080",
        "type": "llamacpp"
      }
    ]
  }
]`