Spaces:

MVPilgrim
/

SemanticSearchPOC

Running

App Files Files Community

MVPilgrim commited on Jun 14, 2024

Commit

a21c53e

1 Parent(s): a6b7d9a

debug

Browse files

Files changed (2) hide show

README.md +23 -24
app.py +13 -11

README.md CHANGED Viewed

@@ -14,12 +14,13 @@ startup_duration_timeout: 3 hours
 # POC for Retrieval Augmented Generation with Large Language Models
 I created this Proof-of-Concept project to learn how to implement Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs).
-I plan to use this POC as starting point for future LLM-based applications by leveraging the power of RAG techniques.
-The "happy path" of the code seems to work pretty well. As noted later, there is more work to be done to make it better.
-If you encounter issues in running the POC, please try reloading the web page. Also, please note that I'm using the 2-vCPU configuration
-to run the POC so re-initialization takes a five or more minutes. Inferencing takes two to three minutes to complete.
 ## Components
 Here are the key components of the project:
@@ -34,33 +35,31 @@ Here are the key components of the project:
 ## Application Notes
-As part of the initialization process, the application executes a Bash script asynchronously. The script carries out these steps:
-- Start the text2vec-transformers Weaviate module first.
-- Then, it starts the Weaviate database server itself.
-- Both programs run as subprocesses to the script.
-- Finally, the script waits to ensure that its subprocesses continue to execute so that app.py
-  can use the database for RAG functions.
-Also, the vector database is only loaded with two collections/schemas based on one webpage each
-from Wikipedia. One page has content related to artifical intelligence and the other content
-about Norwegian literature.
 ## Usage
 To use the application, follow these steps:
-- Type in a prompt and an optional system prompt (e.g., "You are a helpful AI assistant.") in the provided input fields.
-- Click the "Run LLM Prompt" button to initiate the processing of the prompt by the llama-2 LLM.
-- Once the processing is complete, the generated completion will be displayed along with the user's prompt and system prompt.
-- Click the "Get All Rag Data" button to view information on the two documents in the database including chunks.
 ## Future Improvements
 The following areas have been identified for future improvements:
-- Ensure that Retrieval Augmented Generation (RAG) is functioning correctly. When a prompt is created
-  with RAG data, it appears to llama-2 is considering the information along with information it has
-  been trained with. But more testing is needed.
-- Also to this end, add web pages with details on a topic that the LLM won't have been trained with. Compare prompts with
   and without RAG.
 - Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.

 # POC for Retrieval Augmented Generation with Large Language Models
 I created this Proof-of-Concept project to learn how to implement Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs).
+I plan to use this POC as starting point for future LLM-based applications to leverage the power of RAG techniques.
+The "happy path" of the code seems to work fairly well. As noted later, there is more work to be done to improve it.
+If you encounter issues in running the POC, please try reloading the web page. Also, please note that I've currently configured the 2-vCPU configuration
+to run the POC so re-initialization takes five or more minutes. Inferencing takes two to three minutes to complete. I think this has to do with the total load
+on the Huggingface system. Butting adding GPU support is at the top of the list of future improvements.
 ## Components
 Here are the key components of the project:
 ## Application Notes
+As part of the initialization process, the python application executes a Bash script asynchronously. The script carries out these steps:
+- Start the text2vec-transformers Weaviate module to run as an asynchronous process. The Weaviate DB uses this.
+- Start the Weaviate database server to run asWynchronously as well.
+- Wait so that the subprocesses continue run and be ready to accept requests.
+Also, the vector database is only loaded with two Weaviate schemas/collections based on two documents
+in the inputDocs folder. These are main topic html pages from Wikipedia. One page has content related
+to artifical intelligence and the other content about Norwegian literature. More and different web pages
+can be added later.
 ## Usage
 To use the application, follow these steps:
+- Type in an optional system prompt and a user prompt in the corresponding input text boxes.
+- Click the "Run LLM Prompt" button to call the llama-2 LLM with the prompt.
+- Display the completion and the full prompt created by the application using the llama-2 JSON format for prompts.
+- If the "Enable RAG" check box is clicked, the user prompt will be modified to consider RAG information
+  from the Vector DB.
+- Click the "Get All Rag Data" button to view all the information about the two documents in the database including chunks.
 ## Future Improvements
 The following areas have been identified for future improvements:
+- Run the POC with a GPU.
+- Do more testing of the RAG support. Currently, it seems to work basically. But is it producing additional, useful information
+  for inferencing.
+- Also to this end, add web pages with details on a topic that the LLM wasn't trained with. Compare prompts with
   and without RAG.
 - Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.

app.py CHANGED Viewed

@@ -19,6 +19,8 @@ from llama_cpp import Llama
 import streamlit as st
 import subprocess
 import time
@@ -79,7 +81,7 @@ try:
         st.session_state.load_css = True
     # Display UI heading.
-    st.markdown("<h1 style='text-align: center; color: #666666;'>LLM with RAG Using Vector Database Proof of Concept</h1>",
                 unsafe_allow_html=True)
     pathString = "/app/inputDocs"
@@ -539,17 +541,17 @@ try:
     #################################################
     # Format text for easier reading in text areas. #
     #################################################
-    def prettyPrint(jsonText):
         try:
-            logger.info(f"### prettyPrint entered.")
-            if not isinstance(jsonText,str):
-                jsonText = str(jsonText)
-            jsonData = json.loads(jsonText)
-            formattedJson = json.dumps(jsonData, indent=2)
-            logger.info(f"### prettyPrint exited.")
-            return formattedJson
-        except json.JSONDecodeError as e:
-            return jsonText
     #####################################

 import streamlit as st
 import subprocess
 import time
+import pprint
+import io
         st.session_state.load_css = True
     # Display UI heading.
+    st.markdown("<h1 style='text-align: center; color: #666666;'>LLM with RAG Prompting <br style='page-break-after: always;'>Proof of Concept</h1>",
                 unsafe_allow_html=True)
     pathString = "/app/inputDocs"
     #################################################
     # Format text for easier reading in text areas. #
     #################################################
+    def prettyPrint(text):
         try:
+            logger.info("### prettyPrint entered.")
+            outstr = io.StringIO()
+            pprint.pprint(object=text,stream=outstr,indent=1,width=30)
+            prettyText = outstr.getvalue()
+            logger.info("### prettyPrint exited.")
+            return prettyText
+        except Exception as e:
+            print(f"### prettyPrint() e: {e}")
+            return None
     #####################################