Spaces:
Running
Running
Update documentation of OpenAI compatible server configuration (#1141)
Browse filesUpdate README.md
Fixed incorrect setup for extra parameters in OpenAI compatible server configuration (see PR #1032)
README.md
CHANGED
@@ -273,10 +273,12 @@ If `endpoints` are left unspecified, ChatUI will look for the model on the hoste
|
|
273 |
|
274 |
##### OpenAI API compatible models
|
275 |
|
276 |
-
Chat UI can be used with any API server that supports OpenAI API compatibility, for example [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai), [LocalAI](https://github.com/go-skynet/LocalAI), [FastChat](https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), and [ialacol](https://github.com/chenhunghan/ialacol).
|
277 |
|
278 |
The following example config makes Chat UI works with [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai), the `endpoint.baseUrl` is the url of the OpenAI API compatible server, this overrides the baseUrl to be used by OpenAI instance. The `endpoint.completion` determine which endpoint to be used, default is `chat_completions` which uses `v1/chat/completions`, change to `endpoint.completion` to `completions` to use the `v1/completions` endpoint.
|
279 |
|
|
|
|
|
280 |
```
|
281 |
MODELS=`[
|
282 |
{
|
@@ -285,15 +287,17 @@ MODELS=`[
|
|
285 |
"parameters": {
|
286 |
"temperature": 0.9,
|
287 |
"top_p": 0.95,
|
288 |
-
"repetition_penalty": 1.2,
|
289 |
-
"top_k": 50,
|
290 |
-
"truncate": 1000,
|
291 |
"max_new_tokens": 1024,
|
292 |
"stop": []
|
293 |
},
|
294 |
"endpoints": [{
|
295 |
"type" : "openai",
|
296 |
-
"baseURL": "http://localhost:8000/v1"
|
|
|
|
|
|
|
|
|
|
|
297 |
}]
|
298 |
}
|
299 |
]`
|
|
|
273 |
|
274 |
##### OpenAI API compatible models
|
275 |
|
276 |
+
Chat UI can be used with any API server that supports OpenAI API compatibility, for example [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai), [LocalAI](https://github.com/go-skynet/LocalAI), [FastChat](https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), and [ialacol](https://github.com/chenhunghan/ialacol) and [vllm](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html).
|
277 |
|
278 |
The following example config makes Chat UI works with [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai), the `endpoint.baseUrl` is the url of the OpenAI API compatible server, this overrides the baseUrl to be used by OpenAI instance. The `endpoint.completion` determine which endpoint to be used, default is `chat_completions` which uses `v1/chat/completions`, change to `endpoint.completion` to `completions` to use the `v1/completions` endpoint.
|
279 |
|
280 |
+
Parameters not supported by OpenAI (e.g., top_k, repetition_penalty, etc.) must be set in the extraBody of endpoints. Be aware that setting them in parameters will cause them to be omitted.
|
281 |
+
|
282 |
```
|
283 |
MODELS=`[
|
284 |
{
|
|
|
287 |
"parameters": {
|
288 |
"temperature": 0.9,
|
289 |
"top_p": 0.95,
|
|
|
|
|
|
|
290 |
"max_new_tokens": 1024,
|
291 |
"stop": []
|
292 |
},
|
293 |
"endpoints": [{
|
294 |
"type" : "openai",
|
295 |
+
"baseURL": "http://localhost:8000/v1",
|
296 |
+
"extraBody": {
|
297 |
+
"repetition_penalty": 1.2,
|
298 |
+
"top_k": 50,
|
299 |
+
"truncate": 1000
|
300 |
+
}
|
301 |
}]
|
302 |
}
|
303 |
]`
|