MVPilgrim commited on
Commit
a21c53e
·
1 Parent(s): a6b7d9a
Files changed (2) hide show
  1. README.md +23 -24
  2. app.py +13 -11
README.md CHANGED
@@ -14,12 +14,13 @@ startup_duration_timeout: 3 hours
14
 
15
  # POC for Retrieval Augmented Generation with Large Language Models
16
  I created this Proof-of-Concept project to learn how to implement Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs).
17
- I plan to use this POC as starting point for future LLM-based applications by leveraging the power of RAG techniques.
18
 
19
- The "happy path" of the code seems to work pretty well. As noted later, there is more work to be done to make it better.
20
 
21
- If you encounter issues in running the POC, please try reloading the web page. Also, please note that I'm using the 2-vCPU configuration
22
- to run the POC so re-initialization takes a five or more minutes. Inferencing takes two to three minutes to complete.
 
23
 
24
  ## Components
25
  Here are the key components of the project:
@@ -34,33 +35,31 @@ Here are the key components of the project:
34
 
35
 
36
  ## Application Notes
37
- As part of the initialization process, the application executes a Bash script asynchronously. The script carries out these steps:
38
- - Start the text2vec-transformers Weaviate module first.
39
- - Then, it starts the Weaviate database server itself.
40
- - Both programs run as subprocesses to the script.
41
- - Finally, the script waits to ensure that its subprocesses continue to execute so that app.py
42
- can use the database for RAG functions.
43
 
44
- Also, the vector database is only loaded with two collections/schemas based on one webpage each
45
- from Wikipedia. One page has content related to artifical intelligence and the other content
46
- about Norwegian literature.
 
47
 
48
  ## Usage
49
-
50
  To use the application, follow these steps:
51
-
52
- - Type in a prompt and an optional system prompt (e.g., "You are a helpful AI assistant.") in the provided input fields.
53
- - Click the "Run LLM Prompt" button to initiate the processing of the prompt by the llama-2 LLM.
54
- - Once the processing is complete, the generated completion will be displayed along with the user's prompt and system prompt.
55
- - Click the "Get All Rag Data" button to view information on the two documents in the database including chunks.
 
56
 
57
  ## Future Improvements
58
  The following areas have been identified for future improvements:
59
-
60
- - Ensure that Retrieval Augmented Generation (RAG) is functioning correctly. When a prompt is created
61
- with RAG data, it appears to llama-2 is considering the information along with information it has
62
- been trained with. But more testing is needed.
63
- - Also to this end, add web pages with details on a topic that the LLM won't have been trained with. Compare prompts with
64
  and without RAG.
65
  - Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.
66
 
 
14
 
15
  # POC for Retrieval Augmented Generation with Large Language Models
16
  I created this Proof-of-Concept project to learn how to implement Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs).
17
+ I plan to use this POC as starting point for future LLM-based applications to leverage the power of RAG techniques.
18
 
19
+ The "happy path" of the code seems to work fairly well. As noted later, there is more work to be done to improve it.
20
 
21
+ If you encounter issues in running the POC, please try reloading the web page. Also, please note that I've currently configured the 2-vCPU configuration
22
+ to run the POC so re-initialization takes five or more minutes. Inferencing takes two to three minutes to complete. I think this has to do with the total load
23
+ on the Huggingface system. Butting adding GPU support is at the top of the list of future improvements.
24
 
25
  ## Components
26
  Here are the key components of the project:
 
35
 
36
 
37
  ## Application Notes
38
+ As part of the initialization process, the python application executes a Bash script asynchronously. The script carries out these steps:
39
+ - Start the text2vec-transformers Weaviate module to run as an asynchronous process. The Weaviate DB uses this.
40
+ - Start the Weaviate database server to run asWynchronously as well.
41
+ - Wait so that the subprocesses continue run and be ready to accept requests.
 
 
42
 
43
+ Also, the vector database is only loaded with two Weaviate schemas/collections based on two documents
44
+ in the inputDocs folder. These are main topic html pages from Wikipedia. One page has content related
45
+ to artifical intelligence and the other content about Norwegian literature. More and different web pages
46
+ can be added later.
47
 
48
  ## Usage
 
49
  To use the application, follow these steps:
50
+ - Type in an optional system prompt and a user prompt in the corresponding input text boxes.
51
+ - Click the "Run LLM Prompt" button to call the llama-2 LLM with the prompt.
52
+ - Display the completion and the full prompt created by the application using the llama-2 JSON format for prompts.
53
+ - If the "Enable RAG" check box is clicked, the user prompt will be modified to consider RAG information
54
+ from the Vector DB.
55
+ - Click the "Get All Rag Data" button to view all the information about the two documents in the database including chunks.
56
 
57
  ## Future Improvements
58
  The following areas have been identified for future improvements:
59
+ - Run the POC with a GPU.
60
+ - Do more testing of the RAG support. Currently, it seems to work basically. But is it producing additional, useful information
61
+ for inferencing.
62
+ - Also to this end, add web pages with details on a topic that the LLM wasn't trained with. Compare prompts with
 
63
  and without RAG.
64
  - Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.
65
 
app.py CHANGED
@@ -19,6 +19,8 @@ from llama_cpp import Llama
19
  import streamlit as st
20
  import subprocess
21
  import time
 
 
22
 
23
 
24
 
@@ -79,7 +81,7 @@ try:
79
  st.session_state.load_css = True
80
 
81
  # Display UI heading.
82
- st.markdown("<h1 style='text-align: center; color: #666666;'>LLM with RAG Using Vector Database Proof of Concept</h1>",
83
  unsafe_allow_html=True)
84
 
85
  pathString = "/app/inputDocs"
@@ -539,17 +541,17 @@ try:
539
  #################################################
540
  # Format text for easier reading in text areas. #
541
  #################################################
542
- def prettyPrint(jsonText):
543
  try:
544
- logger.info(f"### prettyPrint entered.")
545
- if not isinstance(jsonText,str):
546
- jsonText = str(jsonText)
547
- jsonData = json.loads(jsonText)
548
- formattedJson = json.dumps(jsonData, indent=2)
549
- logger.info(f"### prettyPrint exited.")
550
- return formattedJson
551
- except json.JSONDecodeError as e:
552
- return jsonText
553
 
554
 
555
  #####################################
 
19
  import streamlit as st
20
  import subprocess
21
  import time
22
+ import pprint
23
+ import io
24
 
25
 
26
 
 
81
  st.session_state.load_css = True
82
 
83
  # Display UI heading.
84
+ st.markdown("<h1 style='text-align: center; color: #666666;'>LLM with RAG Prompting <br style='page-break-after: always;'>Proof of Concept</h1>",
85
  unsafe_allow_html=True)
86
 
87
  pathString = "/app/inputDocs"
 
541
  #################################################
542
  # Format text for easier reading in text areas. #
543
  #################################################
544
+ def prettyPrint(text):
545
  try:
546
+ logger.info("### prettyPrint entered.")
547
+ outstr = io.StringIO()
548
+ pprint.pprint(object=text,stream=outstr,indent=1,width=30)
549
+ prettyText = outstr.getvalue()
550
+ logger.info("### prettyPrint exited.")
551
+ return prettyText
552
+ except Exception as e:
553
+ print(f"### prettyPrint() e: {e}")
554
+ return None
555
 
556
 
557
  #####################################