Spaces:
Running
Running
MVPilgrim
commited on
Commit
·
fb792cf
1
Parent(s):
ecf6d51
debug
Browse files
README.md
CHANGED
@@ -14,13 +14,12 @@ startup_duration_timeout: 3 hours
|
|
14 |
|
15 |
# POC for Retrieval Augmented Generation with Large Language Models
|
16 |
I created this Proof-of-Concept project to learn how to implement Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs).
|
17 |
-
I plan to use this POC as starting point for future LLM-based applications to
|
18 |
|
19 |
The "happy path" of the code seems to work fairly well. As noted later, there is more work to be done to improve it.
|
20 |
|
21 |
-
If you encounter issues in running the POC, please try reloading the web page. Also, please note that I've currently configured the 2-vCPU
|
22 |
-
to run the POC
|
23 |
-
on the Huggingface system. Butting adding GPU support is at the top of the list of future improvements.
|
24 |
|
25 |
## Components
|
26 |
Here are the key components of the project:
|
@@ -36,8 +35,8 @@ Here are the key components of the project:
|
|
36 |
|
37 |
## Application Notes
|
38 |
As part of the initialization process, the python application executes a Bash script asynchronously. The script carries out these steps:
|
39 |
-
- Start the text2vec-transformers Weaviate module to run as an asynchronous process.
|
40 |
-
- Start the Weaviate database server to run
|
41 |
- Wait so that the subprocesses continue run and be ready to accept requests.
|
42 |
|
43 |
Also, the vector database is only loaded with two Weaviate schemas/collections based on two documents
|
@@ -50,16 +49,14 @@ To use the application, follow these steps:
|
|
50 |
- Type in an optional system prompt and a user prompt in the corresponding input text boxes.
|
51 |
- Click the "Run LLM Prompt" button to call the llama-2 LLM with the prompt.
|
52 |
- Display the completion and the full prompt created by the application using the llama-2 JSON format for prompts.
|
53 |
-
- If the "Enable RAG" check box is clicked, the user prompt will be modified to
|
54 |
-
from the Vector DB.
|
55 |
- Click the "Get All Rag Data" button to view all the information about the two documents in the database including chunks.
|
56 |
|
57 |
## Future Improvements
|
58 |
The following areas have been identified for future improvements:
|
59 |
- Run the POC with a GPU.
|
60 |
-
-
|
61 |
-
|
62 |
-
- Also to this end, add web pages with details on a topic that the LLM wasn't trained with. Compare prompts with
|
63 |
-
and without RAG.
|
64 |
- Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.
|
65 |
|
|
|
14 |
|
15 |
# POC for Retrieval Augmented Generation with Large Language Models
|
16 |
I created this Proof-of-Concept project to learn how to implement Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs).
|
17 |
+
I plan to use this POC as starting point for future LLM-based applications to use the power of RAG techniques.
|
18 |
|
19 |
The "happy path" of the code seems to work fairly well. As noted later, there is more work to be done to improve it.
|
20 |
|
21 |
+
If you encounter issues in running the POC, please try reloading the web page. Also, please note that I've currently configured the space to use the 2-vCPU setting
|
22 |
+
to run the POC. Depending on load, re-initialization takes five or more minutes. Inferencing can take three minutes or more to complete. Adding GPU support is at the top of the list of future improvements.
|
|
|
23 |
|
24 |
## Components
|
25 |
Here are the key components of the project:
|
|
|
35 |
|
36 |
## Application Notes
|
37 |
As part of the initialization process, the python application executes a Bash script asynchronously. The script carries out these steps:
|
38 |
+
- Start the text2vec-transformers Weaviate module to run as an asynchronous process.
|
39 |
+
- Start the Weaviate database server to run asynchronously as well.
|
40 |
- Wait so that the subprocesses continue run and be ready to accept requests.
|
41 |
|
42 |
Also, the vector database is only loaded with two Weaviate schemas/collections based on two documents
|
|
|
49 |
- Type in an optional system prompt and a user prompt in the corresponding input text boxes.
|
50 |
- Click the "Run LLM Prompt" button to call the llama-2 LLM with the prompt.
|
51 |
- Display the completion and the full prompt created by the application using the llama-2 JSON format for prompts.
|
52 |
+
- If the "Enable RAG" check box is clicked, the user prompt will be modified to include RAG information
|
53 |
+
from the Vector DB for generation.
|
54 |
- Click the "Get All Rag Data" button to view all the information about the two documents in the database including chunks.
|
55 |
|
56 |
## Future Improvements
|
57 |
The following areas have been identified for future improvements:
|
58 |
- Run the POC with a GPU.
|
59 |
+
- Include RAG documents/web pages with distinct information likely not to be in the LLM. This should make it clearer when RAG information is used.
|
60 |
+
- Do more testing of the RAG support.
|
|
|
|
|
61 |
- Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.
|
62 |
|