Spaces:

yonnel
/

karl-movie-vector-backend

Sleeping

App Files Files Community

karl-movie-vector-backend / README_HF.md

yonnel

Add automatic data generation on startup for Hugging Face deployment

b1c879a 2 months ago

preview code

raw

history blame contribute delete

2.92 kB

	---
	title: Karl Movie Vector Backend
	emoji: 🎬
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	license: mit
	---

	# Karl Movie Vector Backend

	FastAPI backend for semantic movie recommendations using FAISS and OpenAI embeddings. Powers intelligent movie discovery with geometric subspace algorithms.

	## Features

	- Semantic movie search using OpenAI embeddings
	- FAISS-powered vector similarity search
	- Geometric subspace algorithms for multi-movie preferences
	- ~150ms response time on CPU
	- RESTful API with Bearer token authentication

	## API Usage

	```bash
	curl -X POST "https://yonnel-karl-movie-vector-backend.hf.space/explore" \
	-H "Authorization: Bearer YOUR_TOKEN" \
	-H "Content-Type: application/json" \
	-d '{
	"liked_ids": [550, 680],
	"disliked_ids": [],
	"top_k": 100
	}'
	```

	# Karl Movie Vector Backend - Hugging Face Deployment

	This FastAPI application provides movie recommendations using vector similarity search.

	## 🚀 Automatic Setup

	This application will automatically build its movie index on first startup. The process includes:

	1. Data Collection: Fetches movie data from TMDB API
	2. Embedding Generation: Creates vector embeddings using OpenAI API
	3. Index Building: Builds FAISS index for fast similarity search
	4. API Startup: Launches the FastAPI service

	⏱️ First startup may take 3-5 minutes to build the index.

	## 🔧 Required Environment Variables

	Configure these in your Hugging Face Space settings:

	### Essential APIs
	- `OPENAI_API_KEY`: Your OpenAI API key for generating embeddings
	- `TMDB_API_KEY`: Your TMDB API key for fetching movie data

	### Optional Configuration
	- `API_TOKEN`: Token for API authentication (optional)
	- `LOG_LEVEL`: Logging level (default: INFO)

	## 📡 API Endpoints

	- `GET /health` - Health check
	- `POST /search` - Search for similar movies
	- `GET /movie/{movie_id}` - Get movie details

	## 🏗️ Technical Details

	- Framework: FastAPI
	- Vector Search: FAISS
	- Embeddings: OpenAI text-embedding-3-small
	- Movie Data: TMDB (The Movie Database)
	- Container: Docker

	## 🔄 Rebuilding Index

	To rebuild the movie index (e.g., to get newer movies):
	1. Delete the Space's persistent storage
	2. Restart the Space
	3. The index will rebuild automatically on startup

	## 📦 Data Files Generated

	The application creates these files on startup:
	- `app/data/faiss.index` - FAISS vector search index
	- `app/data/movies.npy` - Movie embeddings matrix
	- `app/data/id_map.json` - TMDB ID to matrix mapping
	- `app/data/movie_metadata.json` - Movie metadata

	These files are automatically generated and don't need to be included in the repository.

	## Environment Variables

	Set these in your Space settings:
	- `OPENAI_API_KEY`: Your OpenAI API key
	- `TMDB_API_KEY`: Your TMDB API key
	- `API_TOKEN`: Authentication token for API access
	- `ENV`: Set to "prod" for production