Spaces:
Running
title: SemanticSearchPOC
emoji: π
colorFrom: red
colorTo: indigo
sdk: docker
app_port: 8501
pinned: true
startup_duration_timeout: 3 hours
#--- #title: SemanticSearchPOC #emoji: π #colorFrom: purple #colorTo: red #sdk: docker #app_port: 8501 #pinned: false # #---
Retrieval Augmented Generation with Large Language Models
This project serves as a Proof-of-Concept for implementing Retrieval Augmented Generation (RAG) when prompting Large Language Models (LLMs). It is a learning exercise aimed at enabling future LLM-based applications by leveraging the power of RAG techniques.
Components
The project incorporates the following key components:
- llama.cpp: An optimized implementation of the LLaMa language model in C++.
- Weaviate Vector Database: A vector database for efficient storage and retrieval of embeddings.
- text2vec-transformers: A library for converting text to vector representations using transformer models.
- Streamlit: A framework for building interactive web applications with Python.
Screenshot
Note: The screenshot will be included later.
Application Notes
As part of the initialization process, the application executes a Bash script asynchronously. The script follows these steps:
- It starts the text2vec-transformers Weaviate module first.
- Then, it starts the Weaviate database server itself.
- Both programs run as subprocesses to the script.
- Finally, the script waits to ensure that its subprocesses continue to execute so that app.py can use the database for RAG functions.
Also, the vector database is only loaded with two collections/schemas based on one webpage each from Wikipedia. One page has content related to artifical intelligence and the other content about Norwegian literature.
Usage
To use the application, follow these steps:
- Type in a prompt and an optional system prompt (e.g., "You are a helpful AI assistant.") in the provided input fields.
- Click the "Run LLM Prompt" button to initiate the processing of the prompt by the llama-2 LLM.
- Once the processing is complete, the generated completion will be displayed along with the user's prompt and system prompt.
- Click the "Get All Rag Data" button to view information on the two documents in the database including chunks.
Future Improvements
The following areas have been identified for future improvements:
- Ensure that Retrieval Augmented Generation (RAG) is functioning correctly. When a prompt is created with RAG data, it appears to llama-2 is considering the information along with information it has been trained with. But more testing is needed.
- Also to this end, add web pages with details on a topic that the LLM won't have been trained with. Compare prompts with and without RAG.
- Experiment with different database settings on queries such as the distance parameter on the collection query.near_vector() call.