Sergei Petrov commited on
Commit
0ba6d5c
·
1 Parent(s): 2fe6fac

lance version

Browse files
Files changed (2) hide show
  1. README.md +1 -1
  2. gradio_app/requirements.txt +1 -1
README.md CHANGED
@@ -5,7 +5,7 @@ Deliberately stripped down to leave some room for experimenting
5
  - Clone https://github.com/huggingface/transformers to a local machine
6
  - Use the **prep_scrips/markdown_to_text.py** script to extract raw text from markdown from transformers/docs/source/en/
7
  - Break the resulting texts down into semantically meaningful pieces. Experiment with different chunking mechanisms to make sure the semantic meaning is captured.
8
- - Use **prep_scrips/lancedb_setup.py** to embed and store chunks in a [lancedb](https://lancedb.github.io/lancedb/) instance. It also creates an index for fast ANN retrieval (not really needed for this exercise but necessary at scale). You'll need to put your own values into VECTOR_COLUMN_NAME, TEXT_COLUMN_NAME, DB_TABLE_NAME.
9
  - Move the database directory (.lancedb by default) to **gradio_app/**
10
  - Use the template given in **gradio_app** to wrap everything into the [Gradio](https://www.gradio.app/docs/interface) app and run it on HF [spaces](https://huggingface.co/docs/hub/spaces-config-reference). Make sure to adjust VECTOR_COLUMN_NAME, TEXT_COLUMN_NAME, DB_TABLE_NAME according to your DB setup.
11
  - In your space, set up secrets OPENAI_API_KEY and HUGGING_FACE_HUB_TOKEN to use OpenAI and open-source models correspondingly
 
5
  - Clone https://github.com/huggingface/transformers to a local machine
6
  - Use the **prep_scrips/markdown_to_text.py** script to extract raw text from markdown from transformers/docs/source/en/
7
  - Break the resulting texts down into semantically meaningful pieces. Experiment with different chunking mechanisms to make sure the semantic meaning is captured.
8
+ - Use **prep_scrips/lancedb_setup.py** to embed and store chunks in a [lancedb](https://lancedb.github.io/lancedb/) instance. It also creates an index for fast ANN retrieval (not really needed for this exercise but necessary at scale). You'll need to put your own values into VECTOR_COLUMN_NAME, TEXT_COLUMN_NAME, DB_TABLE_NAME. If you are getting lancedb errors at the inference time try to drop the index because it might be not enough data to make it work.
9
  - Move the database directory (.lancedb by default) to **gradio_app/**
10
  - Use the template given in **gradio_app** to wrap everything into the [Gradio](https://www.gradio.app/docs/interface) app and run it on HF [spaces](https://huggingface.co/docs/hub/spaces-config-reference). Make sure to adjust VECTOR_COLUMN_NAME, TEXT_COLUMN_NAME, DB_TABLE_NAME according to your DB setup.
11
  - In your space, set up secrets OPENAI_API_KEY and HUGGING_FACE_HUB_TOKEN to use OpenAI and open-source models correspondingly
gradio_app/requirements.txt CHANGED
@@ -5,5 +5,5 @@ ipywidgets==8.1.1
5
  tqdm==4.66.1
6
  aiohttp==3.8.6
7
  huggingface-hub==0.17.3
8
- lancedb>=0.3
9
  openai==0.28
 
5
  tqdm==4.66.1
6
  aiohttp==3.8.6
7
  huggingface-hub==0.17.3
8
+ lancedb==0.3.1
9
  openai==0.28