Spaces:
Running
Running
MVPilgrim
commited on
Commit
·
a21c53e
1
Parent(s):
a6b7d9a
debug
Browse files
README.md
CHANGED
@@ -14,12 +14,13 @@ startup_duration_timeout: 3 hours
|
|
14 |
|
15 |
# POC for Retrieval Augmented Generation with Large Language Models
|
16 |
I created this Proof-of-Concept project to learn how to implement Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs).
|
17 |
-
I plan to use this POC as starting point for future LLM-based applications
|
18 |
|
19 |
-
The "happy path" of the code seems to work
|
20 |
|
21 |
-
If you encounter issues in running the POC, please try reloading the web page. Also, please note that I'
|
22 |
-
to run the POC so re-initialization takes
|
|
|
23 |
|
24 |
## Components
|
25 |
Here are the key components of the project:
|
@@ -34,33 +35,31 @@ Here are the key components of the project:
|
|
34 |
|
35 |
|
36 |
## Application Notes
|
37 |
-
As part of the initialization process, the application executes a Bash script asynchronously. The script carries out these steps:
|
38 |
-
- Start the text2vec-transformers Weaviate module
|
39 |
-
-
|
40 |
-
-
|
41 |
-
- Finally, the script waits to ensure that its subprocesses continue to execute so that app.py
|
42 |
-
can use the database for RAG functions.
|
43 |
|
44 |
-
Also, the vector database is only loaded with two collections
|
45 |
-
|
46 |
-
about Norwegian literature.
|
|
|
47 |
|
48 |
## Usage
|
49 |
-
|
50 |
To use the application, follow these steps:
|
51 |
-
|
52 |
-
-
|
53 |
-
-
|
54 |
-
-
|
55 |
-
|
|
|
56 |
|
57 |
## Future Improvements
|
58 |
The following areas have been identified for future improvements:
|
59 |
-
|
60 |
-
-
|
61 |
-
|
62 |
-
|
63 |
-
- Also to this end, add web pages with details on a topic that the LLM won't have been trained with. Compare prompts with
|
64 |
and without RAG.
|
65 |
- Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.
|
66 |
|
|
|
14 |
|
15 |
# POC for Retrieval Augmented Generation with Large Language Models
|
16 |
I created this Proof-of-Concept project to learn how to implement Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs).
|
17 |
+
I plan to use this POC as starting point for future LLM-based applications to leverage the power of RAG techniques.
|
18 |
|
19 |
+
The "happy path" of the code seems to work fairly well. As noted later, there is more work to be done to improve it.
|
20 |
|
21 |
+
If you encounter issues in running the POC, please try reloading the web page. Also, please note that I've currently configured the 2-vCPU configuration
|
22 |
+
to run the POC so re-initialization takes five or more minutes. Inferencing takes two to three minutes to complete. I think this has to do with the total load
|
23 |
+
on the Huggingface system. Butting adding GPU support is at the top of the list of future improvements.
|
24 |
|
25 |
## Components
|
26 |
Here are the key components of the project:
|
|
|
35 |
|
36 |
|
37 |
## Application Notes
|
38 |
+
As part of the initialization process, the python application executes a Bash script asynchronously. The script carries out these steps:
|
39 |
+
- Start the text2vec-transformers Weaviate module to run as an asynchronous process. The Weaviate DB uses this.
|
40 |
+
- Start the Weaviate database server to run asWynchronously as well.
|
41 |
+
- Wait so that the subprocesses continue run and be ready to accept requests.
|
|
|
|
|
42 |
|
43 |
+
Also, the vector database is only loaded with two Weaviate schemas/collections based on two documents
|
44 |
+
in the inputDocs folder. These are main topic html pages from Wikipedia. One page has content related
|
45 |
+
to artifical intelligence and the other content about Norwegian literature. More and different web pages
|
46 |
+
can be added later.
|
47 |
|
48 |
## Usage
|
|
|
49 |
To use the application, follow these steps:
|
50 |
+
- Type in an optional system prompt and a user prompt in the corresponding input text boxes.
|
51 |
+
- Click the "Run LLM Prompt" button to call the llama-2 LLM with the prompt.
|
52 |
+
- Display the completion and the full prompt created by the application using the llama-2 JSON format for prompts.
|
53 |
+
- If the "Enable RAG" check box is clicked, the user prompt will be modified to consider RAG information
|
54 |
+
from the Vector DB.
|
55 |
+
- Click the "Get All Rag Data" button to view all the information about the two documents in the database including chunks.
|
56 |
|
57 |
## Future Improvements
|
58 |
The following areas have been identified for future improvements:
|
59 |
+
- Run the POC with a GPU.
|
60 |
+
- Do more testing of the RAG support. Currently, it seems to work basically. But is it producing additional, useful information
|
61 |
+
for inferencing.
|
62 |
+
- Also to this end, add web pages with details on a topic that the LLM wasn't trained with. Compare prompts with
|
|
|
63 |
and without RAG.
|
64 |
- Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.
|
65 |
|
app.py
CHANGED
@@ -19,6 +19,8 @@ from llama_cpp import Llama
|
|
19 |
import streamlit as st
|
20 |
import subprocess
|
21 |
import time
|
|
|
|
|
22 |
|
23 |
|
24 |
|
@@ -79,7 +81,7 @@ try:
|
|
79 |
st.session_state.load_css = True
|
80 |
|
81 |
# Display UI heading.
|
82 |
-
st.markdown("<h1 style='text-align: center; color: #666666;'>LLM with RAG
|
83 |
unsafe_allow_html=True)
|
84 |
|
85 |
pathString = "/app/inputDocs"
|
@@ -539,17 +541,17 @@ try:
|
|
539 |
#################################################
|
540 |
# Format text for easier reading in text areas. #
|
541 |
#################################################
|
542 |
-
def prettyPrint(
|
543 |
try:
|
544 |
-
logger.info(
|
545 |
-
|
546 |
-
|
547 |
-
|
548 |
-
|
549 |
-
|
550 |
-
|
551 |
-
|
552 |
-
return
|
553 |
|
554 |
|
555 |
#####################################
|
|
|
19 |
import streamlit as st
|
20 |
import subprocess
|
21 |
import time
|
22 |
+
import pprint
|
23 |
+
import io
|
24 |
|
25 |
|
26 |
|
|
|
81 |
st.session_state.load_css = True
|
82 |
|
83 |
# Display UI heading.
|
84 |
+
st.markdown("<h1 style='text-align: center; color: #666666;'>LLM with RAG Prompting <br style='page-break-after: always;'>Proof of Concept</h1>",
|
85 |
unsafe_allow_html=True)
|
86 |
|
87 |
pathString = "/app/inputDocs"
|
|
|
541 |
#################################################
|
542 |
# Format text for easier reading in text areas. #
|
543 |
#################################################
|
544 |
+
def prettyPrint(text):
|
545 |
try:
|
546 |
+
logger.info("### prettyPrint entered.")
|
547 |
+
outstr = io.StringIO()
|
548 |
+
pprint.pprint(object=text,stream=outstr,indent=1,width=30)
|
549 |
+
prettyText = outstr.getvalue()
|
550 |
+
logger.info("### prettyPrint exited.")
|
551 |
+
return prettyText
|
552 |
+
except Exception as e:
|
553 |
+
print(f"### prettyPrint() e: {e}")
|
554 |
+
return None
|
555 |
|
556 |
|
557 |
#####################################
|