Spaces:

MVPilgrim
/

SemanticSearchPOC

Sleeping

App Files Files

xet

Community

MVPilgrim commited on Jun 15, 2024

Commit

fb792cf

1 Parent(s): ecf6d51

debug

Browse files

Files changed (1) hide show

README.md +9 -12

README.md CHANGED Viewed

@@ -14,13 +14,12 @@ startup_duration_timeout: 3 hours
 # POC for Retrieval Augmented Generation with Large Language Models
 I created this Proof-of-Concept project to learn how to implement Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs).
-I plan to use this POC as starting point for future LLM-based applications to leverage the power of RAG techniques.
 The "happy path" of the code seems to work fairly well. As noted later, there is more work to be done to improve it.
-If you encounter issues in running the POC, please try reloading the web page. Also, please note that I've currently configured the 2-vCPU configuration
-to run the POC so re-initialization takes five or more minutes. Inferencing takes two to three minutes to complete. I think this has to do with the total load
-on the Huggingface system. Butting adding GPU support is at the top of the list of future improvements.
 ## Components
 Here are the key components of the project:
@@ -36,8 +35,8 @@ Here are the key components of the project:
 ## Application Notes
 As part of the initialization process, the python application executes a Bash script asynchronously. The script carries out these steps:
-- Start the text2vec-transformers Weaviate module to run as an asynchronous process. The Weaviate DB uses this.
-- Start the Weaviate database server to run asWynchronously as well.
 - Wait so that the subprocesses continue run and be ready to accept requests.
 Also, the vector database is only loaded with two Weaviate schemas/collections based on two documents
@@ -50,16 +49,14 @@ To use the application, follow these steps:
 - Type in an optional system prompt and a user prompt in the corresponding input text boxes.
 - Click the "Run LLM Prompt" button to call the llama-2 LLM with the prompt.
 - Display the completion and the full prompt created by the application using the llama-2 JSON format for prompts.
-- If the "Enable RAG" check box is clicked, the user prompt will be modified to consider RAG information
-  from the Vector DB.
 - Click the "Get All Rag Data" button to view all the information about the two documents in the database including chunks.
 ## Future Improvements
 The following areas have been identified for future improvements:
 - Run the POC with a GPU.
-- Do more testing of the RAG support. Currently, it seems to work basically. But is it producing additional, useful information
-  for inferencing.
-- Also to this end, add web pages with details on a topic that the LLM wasn't trained with. Compare prompts with
-  and without RAG.
 - Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.

 # POC for Retrieval Augmented Generation with Large Language Models
 I created this Proof-of-Concept project to learn how to implement Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs).
+I plan to use this POC as starting point for future LLM-based applications to use the power of RAG techniques.
 The "happy path" of the code seems to work fairly well. As noted later, there is more work to be done to improve it.
+If you encounter issues in running the POC, please try reloading the web page. Also, please note that I've currently configured the space to use the 2-vCPU setting
+to run the POC. Depending on load, re-initialization takes five or more minutes. Inferencing can take three minutes or more to complete. Adding GPU support is at the top of the list of future improvements.
 ## Components
 Here are the key components of the project:
 ## Application Notes
 As part of the initialization process, the python application executes a Bash script asynchronously. The script carries out these steps:
+- Start the text2vec-transformers Weaviate module to run as an asynchronous process.
+- Start the Weaviate database server to run asynchronously as well.
 - Wait so that the subprocesses continue run and be ready to accept requests.
 Also, the vector database is only loaded with two Weaviate schemas/collections based on two documents
 - Type in an optional system prompt and a user prompt in the corresponding input text boxes.
 - Click the "Run LLM Prompt" button to call the llama-2 LLM with the prompt.
 - Display the completion and the full prompt created by the application using the llama-2 JSON format for prompts.
+- If the "Enable RAG" check box is clicked, the user prompt will be modified to include RAG information
+  from the Vector DB for generation.
 - Click the "Get All Rag Data" button to view all the information about the two documents in the database including chunks.
 ## Future Improvements
 The following areas have been identified for future improvements:
 - Run the POC with a GPU.
+- Include RAG documents/web pages with distinct information likely not to be in the LLM. This should make it clearer when RAG information is used.
+- Do more testing of the RAG support.
 - Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.