Spaces:

ChienChung
/

SmartRAG_Multi-Agent_Assistant_

Running

ChienChung commited on Apr 3

Commit

1e9e3b5

verified ·

1 Parent(s): 8dc2689

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -1033,6 +1033,8 @@ If no relevant information is found in the document, the system will say "No rel
 *Note: The LLaMA module generates responses based solely on the current query without follow-up memory or chat history management.*
 Feel free to ask any question related to Biden’s 2023 State of the Union Address.
 """
 demo_description2 = """

 *Note: The LLaMA module generates responses based solely on the current query without follow-up memory or chat history management.*
+> This is a CPU-only demo running a **quantised 1B LLaMA model**, built to show that full RAG + multi-agent systems can run even without a GPU. In production, the model can be replaced with larger ones (3B, 7B, etc.) and served using vLLM, 4-bit quantisation, or TensorRT for better speed. The design focuses on portability, deployment, and modularity.
 Feel free to ask any question related to Biden’s 2023 State of the Union Address.
 """
 demo_description2 = """