Sergei Petrov
commited on
Commit
·
0ba6d5c
1
Parent(s):
2fe6fac
lance version
Browse files- README.md +1 -1
- gradio_app/requirements.txt +1 -1
README.md
CHANGED
@@ -5,7 +5,7 @@ Deliberately stripped down to leave some room for experimenting
|
|
5 |
- Clone https://github.com/huggingface/transformers to a local machine
|
6 |
- Use the **prep_scrips/markdown_to_text.py** script to extract raw text from markdown from transformers/docs/source/en/
|
7 |
- Break the resulting texts down into semantically meaningful pieces. Experiment with different chunking mechanisms to make sure the semantic meaning is captured.
|
8 |
-
- Use **prep_scrips/lancedb_setup.py** to embed and store chunks in a [lancedb](https://lancedb.github.io/lancedb/) instance. It also creates an index for fast ANN retrieval (not really needed for this exercise but necessary at scale). You'll need to put your own values into VECTOR_COLUMN_NAME, TEXT_COLUMN_NAME, DB_TABLE_NAME.
|
9 |
- Move the database directory (.lancedb by default) to **gradio_app/**
|
10 |
- Use the template given in **gradio_app** to wrap everything into the [Gradio](https://www.gradio.app/docs/interface) app and run it on HF [spaces](https://huggingface.co/docs/hub/spaces-config-reference). Make sure to adjust VECTOR_COLUMN_NAME, TEXT_COLUMN_NAME, DB_TABLE_NAME according to your DB setup.
|
11 |
- In your space, set up secrets OPENAI_API_KEY and HUGGING_FACE_HUB_TOKEN to use OpenAI and open-source models correspondingly
|
|
|
5 |
- Clone https://github.com/huggingface/transformers to a local machine
|
6 |
- Use the **prep_scrips/markdown_to_text.py** script to extract raw text from markdown from transformers/docs/source/en/
|
7 |
- Break the resulting texts down into semantically meaningful pieces. Experiment with different chunking mechanisms to make sure the semantic meaning is captured.
|
8 |
+
- Use **prep_scrips/lancedb_setup.py** to embed and store chunks in a [lancedb](https://lancedb.github.io/lancedb/) instance. It also creates an index for fast ANN retrieval (not really needed for this exercise but necessary at scale). You'll need to put your own values into VECTOR_COLUMN_NAME, TEXT_COLUMN_NAME, DB_TABLE_NAME. If you are getting lancedb errors at the inference time try to drop the index because it might be not enough data to make it work.
|
9 |
- Move the database directory (.lancedb by default) to **gradio_app/**
|
10 |
- Use the template given in **gradio_app** to wrap everything into the [Gradio](https://www.gradio.app/docs/interface) app and run it on HF [spaces](https://huggingface.co/docs/hub/spaces-config-reference). Make sure to adjust VECTOR_COLUMN_NAME, TEXT_COLUMN_NAME, DB_TABLE_NAME according to your DB setup.
|
11 |
- In your space, set up secrets OPENAI_API_KEY and HUGGING_FACE_HUB_TOKEN to use OpenAI and open-source models correspondingly
|
gradio_app/requirements.txt
CHANGED
@@ -5,5 +5,5 @@ ipywidgets==8.1.1
|
|
5 |
tqdm==4.66.1
|
6 |
aiohttp==3.8.6
|
7 |
huggingface-hub==0.17.3
|
8 |
-
lancedb
|
9 |
openai==0.28
|
|
|
5 |
tqdm==4.66.1
|
6 |
aiohttp==3.8.6
|
7 |
huggingface-hub==0.17.3
|
8 |
+
lancedb==0.3.1
|
9 |
openai==0.28
|