stefanjwojcik's picture
first commit
9ff0a35 verified

How to run some of the code in this repository

1. Make sure Docker is installed on your machine

2. Clone the repository

3. CD into the repository

4. Run the following command to build the docker image

docker docker compose build -t oc-prototype .

5. Run the following command to run the docker image

docker compose up -d oc-prototype
docker exec -it oc-prototype /bin/bash

Prototype TODO's

Data

Functions

  • Upsert vector

  • Batch upsert

  • Query against metadata

  • Generate working Dockerfile for project reproducibility

  • Load data into a database

  • Test precision/recall of embeddings

  • Generate working version of climate demo

Embedding pricing:

1 token = approximately 0.75 words or 1k tokens = 750 words, you pay per 1000 tokens $0.0001 Using that it can be shown that you get about 4 characters per token or 4Kb of embedding text per 1k tokens or $0.0001 Using that as your basis you can approximate the cost of your embedding by : Cost in $ = Size of Data in Kilobytes * 0.000025

$0.100 / 1M tokens

Credentials for running google cloud queries: see ostreacultura-credentials.json