Update app.py
Browse files
app.py
CHANGED
@@ -1033,6 +1033,8 @@ If no relevant information is found in the document, the system will say "No rel
|
|
1033 |
|
1034 |
*Note: The LLaMA module generates responses based solely on the current query without follow-up memory or chat history management.*
|
1035 |
|
|
|
|
|
1036 |
Feel free to ask any question related to Biden’s 2023 State of the Union Address.
|
1037 |
"""
|
1038 |
demo_description2 = """
|
|
|
1033 |
|
1034 |
*Note: The LLaMA module generates responses based solely on the current query without follow-up memory or chat history management.*
|
1035 |
|
1036 |
+
> This is a CPU-only demo running a **quantised 1B LLaMA model**, built to show that full RAG + multi-agent systems can run even without a GPU. In production, the model can be replaced with larger ones (3B, 7B, etc.) and served using vLLM, 4-bit quantisation, or TensorRT for better speed. The design focuses on portability, deployment, and modularity.
|
1037 |
+
|
1038 |
Feel free to ask any question related to Biden’s 2023 State of the Union Address.
|
1039 |
"""
|
1040 |
demo_description2 = """
|