Spaces:

imperialwool
/

llama-cpp-api

Sleeping

App Files Files Community

toaster61 commited on Sep 30, 2023

Commit

7fd3f9f

1 Parent(s): 1c1f4f1

ggml -> gguf

Browse files

Files changed (7) hide show

Dockerfile +3 -4
README.md +2 -2
app.py +3 -1
app_gradio.py +45 -0
requirements.txt +2 -1
system.prompt +0 -1
wget-log +6 -0

Dockerfile CHANGED Viewed

@@ -6,7 +6,7 @@ USER root
 # Installing gcc compiler and main library.
 RUN apt update && apt install gcc cmake build-essential -y
-RUN CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python==0.1.78
 # Copying files into folder and making it working dir.
 RUN mkdir app
@@ -16,10 +16,9 @@ WORKDIR /app
 # Installing wget and downloading model.
 RUN apt install wget -y
-RUN wget -q -O model.bin https://huggingface.co/OpenBuddy/openbuddy-ggml/resolve/main/openbuddy-openllama-3b-v10-q5_0.bin
 RUN ls
-# You can use other models! Visit https://huggingface.co/OpenBuddy/openbuddy-ggml and choose model that u like!
-# Or u can comment this two RUNs and include in Space/repo/Docker image own model with name "model.bin".
 # Updating pip and installing everything from requirements
 RUN python3 -m pip install -U --no-cache-dir pip setuptools wheel

 # Installing gcc compiler and main library.
 RUN apt update && apt install gcc cmake build-essential -y
+RUN CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
 # Copying files into folder and making it working dir.
 RUN mkdir app
 # Installing wget and downloading model.
 RUN apt install wget -y
+RUN wget -q -O model.bin https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GGUF/blob/main/openbuddy-llama2-13b-v11.1.Q5_K_M.gguf
 RUN ls
+# You can use other models! Or u can comment this two RUNs and include in Space/repo/Docker image own model with name "model.bin".
 # Updating pip and installing everything from requirements
 RUN python3 -m pip install -U --no-cache-dir pip setuptools wheel

README.md CHANGED Viewed

@@ -4,9 +4,9 @@ emoji: 🏆
 colorFrom: red
 colorTo: indigo
 sdk: docker
-pinned: false
 ---
-This api built using Quart, uvicorn for openbuddy's models.
 For example I used https://huggingface.co/OpenBuddy/openbuddy-openllama-3b-v10-bf16

 colorFrom: red
 colorTo: indigo
 sdk: docker
+pinned: true
 ---
+This api built using Gradio with queue for openbuddy's models. Also includes Quart and uvicorn setup!
 For example I used https://huggingface.co/OpenBuddy/openbuddy-openllama-3b-v10-bf16

app.py CHANGED Viewed

@@ -33,4 +33,6 @@ Change <code>`CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS`</code> i
 Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a>, <a href="https://quart.palletsprojects.com/">Quart</a> and <a href="https://www.uvicorn.org/">Uvicorn</a>.<br>
 <h1>How to test it on own machine?</h1>
 You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
-Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>'''

 Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a>, <a href="https://quart.palletsprojects.com/">Quart</a> and <a href="https://www.uvicorn.org/">Uvicorn</a>.<br>
 <h1>How to test it on own machine?</h1>
 You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
+Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>
+<br>
+<h1>Also now it can run with Gradio! Check the repo!</h1>'''

app_gradio.py ADDED Viewed

	@@ -0,0 +1,45 @@

+import gradio as gr
+from llama_cpp import Llama
+llm = Llama(model_path="./model.bin")
+with open('system.prompt', 'r', encoding='utf-8') as f:
+    prompt = f.read()
+title = "Openbuddy LLama Api"
+desc = '''<h1>Hello, world!</h1>
+This is showcase how to make own server with OpenBuddy's model.<br>
+I'm using here 3b model just for example. Also here's only CPU power.<br>
+But you can use GPU power as well!<br><br>
+<h1>How to GPU?</h1>
+Change <code>`CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS`</code> in Dockerfile on <code>`CMAKE_ARGS="-DLLAMA_CUBLAS=on"`</code>. Also you can try <code>`DLLAMA_CLBLAST`</code>, <code>`DLLAMA_METAL`</code> or <code>`DLLAMA_METAL`</code>.<br>
+Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a> and <a href="https://www.gradio.app/">Gradio</a>.<br><br>
+<h1>How to test it on own machine?</h1>
+You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
+Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br><br>
+Also it can run with quart+uvicorn! Check the repo!'''
+def greet(request: str, max_tokens: int = 64, override_system_prompt: str = ""):
+    try:
+        system_prompt = override_system_prompt if override_system_prompt != "" else prompt
+        max_tokens = max_tokens if max_tokens > 0 and max_tokens < 256 else 64
+        userPrompt = system_prompt + "\n\nUser: " + request + "\nAssistant: "
+    except: return "ERROR 400: Not enough data"
+    try:
+        output = llm(userPrompt, max_tokens=max_tokens, stop=["User:", "\n"], echo=False)
+        print(output)
+        return output["choices"][0]["text"]
+    except Exception as e:
+        print(e)
+        return "ERROR 500: Server error. Check logs!!"
+demo = gr.Interface(
+    fn=greet,
+    inputs=[gr.Text("Hello, how are you?"), gr.Number(64), gr.Textbox()],
+    outputs=["text"],
+    description=desc,
+    title=title,
+    allow_flagging="never"
+).queue()
+if __name__ == "__main__":
+    demo.launch()

requirements.txt CHANGED Viewed

@@ -1,2 +1,3 @@
 quart
-uvicorn

 quart
+uvicorn
+gradio

system.prompt CHANGED Viewed

	@@ -1 +0,0 @@
1	- Prompt: Отвечай максимально кратко и по делу.

wget-log ADDED Viewed

	@@ -0,0 +1,6 @@

+--2023-09-30 17:29:11--  https://cdn-lfs.huggingface.co/repos/c8/66/c866ea7f0aa48d9e6cd5d10064562a36f8b43f272e5508eceac84d411b157f32/557b305cd42ca3da588a0e2f16dc1aceedcc73232fc2174da311428f16f0ca9e?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27openbuddy-llama2-13b-v8.1-q3_K.bin%3B+filename%3D%22openbuddy-llama2-13b-v8.1-q3_K.bin%22%3B
+Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 108.156.22.122, 108.156.22.7, 108.156.22.58, ...
+Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|108.156.22.122|:443... connected.
+HTTP request sent, awaiting response... 403 Forbidden
+2023-09-30 17:29:11 ERROR 403: Forbidden.