Spaces:

stefanjwojcik
/

misinfo_detection_app

Running

App Files Files Community

misinfo_detection_app / ReadMe.md

stefanjwojcik

first commit

9ff0a35 verified 5 months ago

preview code

raw

history blame

1.46 kB

How to run some of the code in this repository

1. Make sure Docker is installed on your machine

2. Clone the repository

3. CD into the repository

4. Run the following command to build the docker image

docker docker compose build -t oc-prototype .

5. Run the following command to run the docker image

docker compose up -d oc-prototype
docker exec -it oc-prototype /bin/bash

Prototype TODO's

Data

Process all misinfo claims and generate embeddings for a library namespace
Upsert claims into pinecone
Upsert 300k into namespace
Update claim format to be similar to: https://www.kaggle.com/datasets/shivkumarganesh/politifact-factcheck-data/data

Functions

Upsert vector
Batch upsert
Query against metadata
Generate working Dockerfile for project reproducibility
Load data into a database
Test precision/recall of embeddings
Generate working version of climate demo

Embedding pricing:

1 token = approximately 0.75 words or 1k tokens = 750 words, you pay per 1000 tokens $0.0001 Using that it can be shown that you get about 4 characters per token or 4Kb of embedding text per 1k tokens or $0.0001 Using that as your basis you can approximate the cost of your embedding by : Cost in $ = Size of Data in Kilobytes * 0.000025

$0.100 / 1M tokens

Credentials for running google cloud queries: see ostreacultura-credentials.json