ChienChung commited on
Commit
1e9e3b5
·
verified ·
1 Parent(s): 8dc2689

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +2 -0
app.py CHANGED
@@ -1033,6 +1033,8 @@ If no relevant information is found in the document, the system will say "No rel
1033
 
1034
  *Note: The LLaMA module generates responses based solely on the current query without follow-up memory or chat history management.*
1035
 
 
 
1036
  Feel free to ask any question related to Biden’s 2023 State of the Union Address.
1037
  """
1038
  demo_description2 = """
 
1033
 
1034
  *Note: The LLaMA module generates responses based solely on the current query without follow-up memory or chat history management.*
1035
 
1036
+ > This is a CPU-only demo running a **quantised 1B LLaMA model**, built to show that full RAG + multi-agent systems can run even without a GPU. In production, the model can be replaced with larger ones (3B, 7B, etc.) and served using vLLM, 4-bit quantisation, or TensorRT for better speed. The design focuses on portability, deployment, and modularity.
1037
+
1038
  Feel free to ask any question related to Biden’s 2023 State of the Union Address.
1039
  """
1040
  demo_description2 = """