MVPilgrim commited on
Commit
fb792cf
·
1 Parent(s): ecf6d51
Files changed (1) hide show
  1. README.md +9 -12
README.md CHANGED
@@ -14,13 +14,12 @@ startup_duration_timeout: 3 hours
14
 
15
  # POC for Retrieval Augmented Generation with Large Language Models
16
  I created this Proof-of-Concept project to learn how to implement Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs).
17
- I plan to use this POC as starting point for future LLM-based applications to leverage the power of RAG techniques.
18
 
19
  The "happy path" of the code seems to work fairly well. As noted later, there is more work to be done to improve it.
20
 
21
- If you encounter issues in running the POC, please try reloading the web page. Also, please note that I've currently configured the 2-vCPU configuration
22
- to run the POC so re-initialization takes five or more minutes. Inferencing takes two to three minutes to complete. I think this has to do with the total load
23
- on the Huggingface system. Butting adding GPU support is at the top of the list of future improvements.
24
 
25
  ## Components
26
  Here are the key components of the project:
@@ -36,8 +35,8 @@ Here are the key components of the project:
36
 
37
  ## Application Notes
38
  As part of the initialization process, the python application executes a Bash script asynchronously. The script carries out these steps:
39
- - Start the text2vec-transformers Weaviate module to run as an asynchronous process. The Weaviate DB uses this.
40
- - Start the Weaviate database server to run asWynchronously as well.
41
  - Wait so that the subprocesses continue run and be ready to accept requests.
42
 
43
  Also, the vector database is only loaded with two Weaviate schemas/collections based on two documents
@@ -50,16 +49,14 @@ To use the application, follow these steps:
50
  - Type in an optional system prompt and a user prompt in the corresponding input text boxes.
51
  - Click the "Run LLM Prompt" button to call the llama-2 LLM with the prompt.
52
  - Display the completion and the full prompt created by the application using the llama-2 JSON format for prompts.
53
- - If the "Enable RAG" check box is clicked, the user prompt will be modified to consider RAG information
54
- from the Vector DB.
55
  - Click the "Get All Rag Data" button to view all the information about the two documents in the database including chunks.
56
 
57
  ## Future Improvements
58
  The following areas have been identified for future improvements:
59
  - Run the POC with a GPU.
60
- - Do more testing of the RAG support. Currently, it seems to work basically. But is it producing additional, useful information
61
- for inferencing.
62
- - Also to this end, add web pages with details on a topic that the LLM wasn't trained with. Compare prompts with
63
- and without RAG.
64
  - Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.
65
 
 
14
 
15
  # POC for Retrieval Augmented Generation with Large Language Models
16
  I created this Proof-of-Concept project to learn how to implement Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs).
17
+ I plan to use this POC as starting point for future LLM-based applications to use the power of RAG techniques.
18
 
19
  The "happy path" of the code seems to work fairly well. As noted later, there is more work to be done to improve it.
20
 
21
+ If you encounter issues in running the POC, please try reloading the web page. Also, please note that I've currently configured the space to use the 2-vCPU setting
22
+ to run the POC. Depending on load, re-initialization takes five or more minutes. Inferencing can take three minutes or more to complete. Adding GPU support is at the top of the list of future improvements.
 
23
 
24
  ## Components
25
  Here are the key components of the project:
 
35
 
36
  ## Application Notes
37
  As part of the initialization process, the python application executes a Bash script asynchronously. The script carries out these steps:
38
+ - Start the text2vec-transformers Weaviate module to run as an asynchronous process.
39
+ - Start the Weaviate database server to run asynchronously as well.
40
  - Wait so that the subprocesses continue run and be ready to accept requests.
41
 
42
  Also, the vector database is only loaded with two Weaviate schemas/collections based on two documents
 
49
  - Type in an optional system prompt and a user prompt in the corresponding input text boxes.
50
  - Click the "Run LLM Prompt" button to call the llama-2 LLM with the prompt.
51
  - Display the completion and the full prompt created by the application using the llama-2 JSON format for prompts.
52
+ - If the "Enable RAG" check box is clicked, the user prompt will be modified to include RAG information
53
+ from the Vector DB for generation.
54
  - Click the "Get All Rag Data" button to view all the information about the two documents in the database including chunks.
55
 
56
  ## Future Improvements
57
  The following areas have been identified for future improvements:
58
  - Run the POC with a GPU.
59
+ - Include RAG documents/web pages with distinct information likely not to be in the LLM. This should make it clearer when RAG information is used.
60
+ - Do more testing of the RAG support.
 
 
61
  - Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.
62