Luca commited on
Commit
c68f49f
·
unverified ·
1 Parent(s): 4108477

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -19,12 +19,14 @@ app_port: 8501
19
 
20
  https://lfoppiano-document-qa.hf.space/
21
 
 
 
22
  ## Introduction
23
 
24
- Question/Answering on scientific documents using LLMs: ChatGPT-3.5-turbo, GPT4, GPT4-Turbo, Mistral-7b-instruct and Zephyr-7b-beta.
25
  The streamlit application demonstrates the implementation of a RAG (Retrieval Augmented Generation) on scientific documents.
26
- **Different to most of the projects**, we focus on scientific articles and we extract text from a structured document.
27
- We target only the full-text using [Grobid](https://github.com/kermitt2/grobid) which provides cleaner results than the raw PDF2Text converter (which is comparable with most of other solutions).
28
 
29
  Additionally, this frontend provides the visualisation of named entities on LLM responses to extract <span stype="color:yellow">physical quantities, measurements</span> (with [grobid-quantities](https://github.com/kermitt2/grobid-quantities)) and <span stype="color:blue">materials</span> mentions (with [grobid-superconductors](https://github.com/lfoppiano/grobid-superconductors)).
30
 
@@ -35,8 +37,6 @@ Additionally, this frontend provides the visualisation of named entities on LLM
35
 
36
  ## Getting started
37
 
38
- - Select the model+embedding combination you want to use
39
- - If using gpt3.5-turbo, gpt4 or gpt4-turbo, enter your API Key ([Open AI](https://platform.openai.com/account/api-keys)).
40
  - Upload a scientific article as a PDF document. You will see a spinner or loading indicator while the processing is in progress.
41
  - Once the spinner disappears, you can proceed to ask your questions
42
 
@@ -45,7 +45,7 @@ Additionally, this frontend provides the visualisation of named entities on LLM
45
  ## Documentation
46
 
47
  ### Embedding selection
48
- In the latest version there is the possibility to select both embedding functions and LLMs. There are some limitation, OpenAI embeddings cannot be used with open source models, and viceversa.
49
 
50
  ### Context size
51
  Allow to change the number of blocks from the original document that are considered for responding.
@@ -61,10 +61,10 @@ Smaller blocks will result in a smaller context, yielding more precise sections
61
  Larger blocks will result in a larger context less constrained around the question.
62
 
63
  ### Query mode
64
- Indicates whether sending a question to the LLM (Language Model) or to the vector storage.
65
  - **LLM** (default) enables question/answering related to the document content.
66
  - **Embeddings**: the response will consist of the raw text from the document related to the question (based on the embeddings). This mode helps to test why sometimes the answers are not satisfying or incomplete.
67
- - **Question coefficient** (experimental): provide a coefficient that indicate how the question has been far or closed to the retrieved context
68
 
69
  ### NER (Named Entities Recognition)
70
  This feature is specifically crafted for people working with scientific documents in materials science.
@@ -73,8 +73,8 @@ This feature leverages both [grobid-quantities](https://github.com/kermitt2/grob
73
 
74
  ### Troubleshooting
75
  Error: `streamlit: Your system has an unsupported version of sqlite3. Chroma requires sqlite3 >= 3.35.0`.
76
- Here the [solution on Linux](https://stackoverflow.com/questions/76958817/streamlit-your-system-has-an-unsupported-version-of-sqlite3-chroma-requires-sq).
77
- For more information, see the [details](https://docs.trychroma.com/troubleshooting#sqlite) on Chroma website.
78
 
79
  ## Disclaimer on Data, Security, and Privacy ⚠️
80
 
 
19
 
20
  https://lfoppiano-document-qa.hf.space/
21
 
22
+ **NOTE**: The LLM API is kindly provided by [Modal.com](https://www.modal.com) which offers 30$/month for computing. When these are done, the app will stop answering. 😅
23
+
24
  ## Introduction
25
 
26
+ Question/Answering on scientific documents using LLMs. The tool can be customized to use different types of LLM APIs.
27
  The streamlit application demonstrates the implementation of a RAG (Retrieval Augmented Generation) on scientific documents.
28
+ **Different from most of the projects**, we focus on scientific articles and extract text from a structured document.
29
+ We target only the full text using [Grobid](https://github.com/kermitt2/grobid) which provides cleaner results than the raw PDF2Text converter (which is comparable with most of the other solutions).
30
 
31
  Additionally, this frontend provides the visualisation of named entities on LLM responses to extract <span stype="color:yellow">physical quantities, measurements</span> (with [grobid-quantities](https://github.com/kermitt2/grobid-quantities)) and <span stype="color:blue">materials</span> mentions (with [grobid-superconductors](https://github.com/lfoppiano/grobid-superconductors)).
32
 
 
37
 
38
  ## Getting started
39
 
 
 
40
  - Upload a scientific article as a PDF document. You will see a spinner or loading indicator while the processing is in progress.
41
  - Once the spinner disappears, you can proceed to ask your questions
42
 
 
45
  ## Documentation
46
 
47
  ### Embedding selection
48
+ In the latest version, there is the possibility to select both embedding functions and LLMs. There are some limitations, OpenAI embeddings cannot be used with open source models, and vice-versa.
49
 
50
  ### Context size
51
  Allow to change the number of blocks from the original document that are considered for responding.
 
61
  Larger blocks will result in a larger context less constrained around the question.
62
 
63
  ### Query mode
64
+ Indicates whether sending a question to the LLM (Language Model) or the vector storage.
65
  - **LLM** (default) enables question/answering related to the document content.
66
  - **Embeddings**: the response will consist of the raw text from the document related to the question (based on the embeddings). This mode helps to test why sometimes the answers are not satisfying or incomplete.
67
+ - **Question coefficient** (experimental): provide a coefficient that indicates how the question has been far or closed to the retrieved context
68
 
69
  ### NER (Named Entities Recognition)
70
  This feature is specifically crafted for people working with scientific documents in materials science.
 
73
 
74
  ### Troubleshooting
75
  Error: `streamlit: Your system has an unsupported version of sqlite3. Chroma requires sqlite3 >= 3.35.0`.
76
+ Here is the [solution on Linux](https://stackoverflow.com/questions/76958817/streamlit-your-system-has-an-unsupported-version-of-sqlite3-chroma-requires-sq).
77
+ For more information, see the [details](https://docs.trychroma.com/troubleshooting#sqlite) on the Chroma website.
78
 
79
  ## Disclaimer on Data, Security, and Privacy ⚠️
80