Spaces:
Sleeping
Sleeping
toaster61
commited on
Commit
·
7fd3f9f
1
Parent(s):
1c1f4f1
ggml -> gguf
Browse files- Dockerfile +3 -4
- README.md +2 -2
- app.py +3 -1
- app_gradio.py +45 -0
- requirements.txt +2 -1
- system.prompt +0 -1
- wget-log +6 -0
Dockerfile
CHANGED
|
@@ -6,7 +6,7 @@ USER root
|
|
| 6 |
|
| 7 |
# Installing gcc compiler and main library.
|
| 8 |
RUN apt update && apt install gcc cmake build-essential -y
|
| 9 |
-
RUN CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
|
| 10 |
|
| 11 |
# Copying files into folder and making it working dir.
|
| 12 |
RUN mkdir app
|
|
@@ -16,10 +16,9 @@ WORKDIR /app
|
|
| 16 |
|
| 17 |
# Installing wget and downloading model.
|
| 18 |
RUN apt install wget -y
|
| 19 |
-
RUN wget -q -O model.bin https://huggingface.co/OpenBuddy
|
| 20 |
RUN ls
|
| 21 |
-
# You can use other models!
|
| 22 |
-
# Or u can comment this two RUNs and include in Space/repo/Docker image own model with name "model.bin".
|
| 23 |
|
| 24 |
# Updating pip and installing everything from requirements
|
| 25 |
RUN python3 -m pip install -U --no-cache-dir pip setuptools wheel
|
|
|
|
| 6 |
|
| 7 |
# Installing gcc compiler and main library.
|
| 8 |
RUN apt update && apt install gcc cmake build-essential -y
|
| 9 |
+
RUN CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
|
| 10 |
|
| 11 |
# Copying files into folder and making it working dir.
|
| 12 |
RUN mkdir app
|
|
|
|
| 16 |
|
| 17 |
# Installing wget and downloading model.
|
| 18 |
RUN apt install wget -y
|
| 19 |
+
RUN wget -q -O model.bin https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GGUF/blob/main/openbuddy-llama2-13b-v11.1.Q5_K_M.gguf
|
| 20 |
RUN ls
|
| 21 |
+
# You can use other models! Or u can comment this two RUNs and include in Space/repo/Docker image own model with name "model.bin".
|
|
|
|
| 22 |
|
| 23 |
# Updating pip and installing everything from requirements
|
| 24 |
RUN python3 -m pip install -U --no-cache-dir pip setuptools wheel
|
README.md
CHANGED
|
@@ -4,9 +4,9 @@ emoji: 🏆
|
|
| 4 |
colorFrom: red
|
| 5 |
colorTo: indigo
|
| 6 |
sdk: docker
|
| 7 |
-
pinned:
|
| 8 |
---
|
| 9 |
|
| 10 |
-
This api built using
|
| 11 |
|
| 12 |
For example I used https://huggingface.co/OpenBuddy/openbuddy-openllama-3b-v10-bf16
|
|
|
|
| 4 |
colorFrom: red
|
| 5 |
colorTo: indigo
|
| 6 |
sdk: docker
|
| 7 |
+
pinned: true
|
| 8 |
---
|
| 9 |
|
| 10 |
+
This api built using Gradio with queue for openbuddy's models. Also includes Quart and uvicorn setup!
|
| 11 |
|
| 12 |
For example I used https://huggingface.co/OpenBuddy/openbuddy-openllama-3b-v10-bf16
|
app.py
CHANGED
|
@@ -33,4 +33,6 @@ Change <code>`CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS`</code> i
|
|
| 33 |
Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a>, <a href="https://quart.palletsprojects.com/">Quart</a> and <a href="https://www.uvicorn.org/">Uvicorn</a>.<br>
|
| 34 |
<h1>How to test it on own machine?</h1>
|
| 35 |
You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
|
| 36 |
-
Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>
|
|
|
|
|
|
|
|
|
| 33 |
Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a>, <a href="https://quart.palletsprojects.com/">Quart</a> and <a href="https://www.uvicorn.org/">Uvicorn</a>.<br>
|
| 34 |
<h1>How to test it on own machine?</h1>
|
| 35 |
You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
|
| 36 |
+
Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>
|
| 37 |
+
<br>
|
| 38 |
+
<h1>Also now it can run with Gradio! Check the repo!</h1>'''
|
app_gradio.py
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
from llama_cpp import Llama
|
| 3 |
+
|
| 4 |
+
llm = Llama(model_path="./model.bin")
|
| 5 |
+
|
| 6 |
+
with open('system.prompt', 'r', encoding='utf-8') as f:
|
| 7 |
+
prompt = f.read()
|
| 8 |
+
|
| 9 |
+
title = "Openbuddy LLama Api"
|
| 10 |
+
desc = '''<h1>Hello, world!</h1>
|
| 11 |
+
This is showcase how to make own server with OpenBuddy's model.<br>
|
| 12 |
+
I'm using here 3b model just for example. Also here's only CPU power.<br>
|
| 13 |
+
But you can use GPU power as well!<br><br>
|
| 14 |
+
<h1>How to GPU?</h1>
|
| 15 |
+
Change <code>`CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS`</code> in Dockerfile on <code>`CMAKE_ARGS="-DLLAMA_CUBLAS=on"`</code>. Also you can try <code>`DLLAMA_CLBLAST`</code>, <code>`DLLAMA_METAL`</code> or <code>`DLLAMA_METAL`</code>.<br>
|
| 16 |
+
Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a> and <a href="https://www.gradio.app/">Gradio</a>.<br><br>
|
| 17 |
+
<h1>How to test it on own machine?</h1>
|
| 18 |
+
You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
|
| 19 |
+
Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br><br>
|
| 20 |
+
Also it can run with quart+uvicorn! Check the repo!'''
|
| 21 |
+
|
| 22 |
+
def greet(request: str, max_tokens: int = 64, override_system_prompt: str = ""):
|
| 23 |
+
try:
|
| 24 |
+
system_prompt = override_system_prompt if override_system_prompt != "" else prompt
|
| 25 |
+
max_tokens = max_tokens if max_tokens > 0 and max_tokens < 256 else 64
|
| 26 |
+
userPrompt = system_prompt + "\n\nUser: " + request + "\nAssistant: "
|
| 27 |
+
except: return "ERROR 400: Not enough data"
|
| 28 |
+
try:
|
| 29 |
+
output = llm(userPrompt, max_tokens=max_tokens, stop=["User:", "\n"], echo=False)
|
| 30 |
+
print(output)
|
| 31 |
+
return output["choices"][0]["text"]
|
| 32 |
+
except Exception as e:
|
| 33 |
+
print(e)
|
| 34 |
+
return "ERROR 500: Server error. Check logs!!"
|
| 35 |
+
|
| 36 |
+
demo = gr.Interface(
|
| 37 |
+
fn=greet,
|
| 38 |
+
inputs=[gr.Text("Hello, how are you?"), gr.Number(64), gr.Textbox()],
|
| 39 |
+
outputs=["text"],
|
| 40 |
+
description=desc,
|
| 41 |
+
title=title,
|
| 42 |
+
allow_flagging="never"
|
| 43 |
+
).queue()
|
| 44 |
+
if __name__ == "__main__":
|
| 45 |
+
demo.launch()
|
requirements.txt
CHANGED
|
@@ -1,2 +1,3 @@
|
|
| 1 |
quart
|
| 2 |
-
uvicorn
|
|
|
|
|
|
| 1 |
quart
|
| 2 |
+
uvicorn
|
| 3 |
+
gradio
|
system.prompt
CHANGED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
Prompt: Отвечай максимально кратко и по делу.
|
|
|
|
|
|
wget-log
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
--2023-09-30 17:29:11-- https://cdn-lfs.huggingface.co/repos/c8/66/c866ea7f0aa48d9e6cd5d10064562a36f8b43f272e5508eceac84d411b157f32/557b305cd42ca3da588a0e2f16dc1aceedcc73232fc2174da311428f16f0ca9e?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27openbuddy-llama2-13b-v8.1-q3_K.bin%3B+filename%3D%22openbuddy-llama2-13b-v8.1-q3_K.bin%22%3B
|
| 2 |
+
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 108.156.22.122, 108.156.22.7, 108.156.22.58, ...
|
| 3 |
+
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|108.156.22.122|:443... connected.
|
| 4 |
+
HTTP request sent, awaiting response... 403 Forbidden
|
| 5 |
+
2023-09-30 17:29:11 ERROR 403: Forbidden.
|
| 6 |
+
|