yonnel
Add automatic data generation on startup for Hugging Face deployment
b1c879a
---
title: Karl Movie Vector Backend
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
---
# Karl Movie Vector Backend
FastAPI backend for semantic movie recommendations using FAISS and OpenAI embeddings. Powers intelligent movie discovery with geometric subspace algorithms.
## Features
- Semantic movie search using OpenAI embeddings
- FAISS-powered vector similarity search
- Geometric subspace algorithms for multi-movie preferences
- ~150ms response time on CPU
- RESTful API with Bearer token authentication
## API Usage
```bash
curl -X POST "https://yonnel-karl-movie-vector-backend.hf.space/explore" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"liked_ids": [550, 680],
"disliked_ids": [],
"top_k": 100
}'
```
# Karl Movie Vector Backend - Hugging Face Deployment
This FastAPI application provides movie recommendations using vector similarity search.
## πŸš€ Automatic Setup
This application will automatically build its movie index on first startup. The process includes:
1. **Data Collection**: Fetches movie data from TMDB API
2. **Embedding Generation**: Creates vector embeddings using OpenAI API
3. **Index Building**: Builds FAISS index for fast similarity search
4. **API Startup**: Launches the FastAPI service
⏱️ **First startup may take 3-5 minutes** to build the index.
## πŸ”§ Required Environment Variables
Configure these in your Hugging Face Space settings:
### Essential APIs
- `OPENAI_API_KEY`: Your OpenAI API key for generating embeddings
- `TMDB_API_KEY`: Your TMDB API key for fetching movie data
### Optional Configuration
- `API_TOKEN`: Token for API authentication (optional)
- `LOG_LEVEL`: Logging level (default: INFO)
## πŸ“‘ API Endpoints
- `GET /health` - Health check
- `POST /search` - Search for similar movies
- `GET /movie/{movie_id}` - Get movie details
## πŸ—οΈ Technical Details
- **Framework**: FastAPI
- **Vector Search**: FAISS
- **Embeddings**: OpenAI text-embedding-3-small
- **Movie Data**: TMDB (The Movie Database)
- **Container**: Docker
## πŸ”„ Rebuilding Index
To rebuild the movie index (e.g., to get newer movies):
1. Delete the Space's persistent storage
2. Restart the Space
3. The index will rebuild automatically on startup
## πŸ“¦ Data Files Generated
The application creates these files on startup:
- `app/data/faiss.index` - FAISS vector search index
- `app/data/movies.npy` - Movie embeddings matrix
- `app/data/id_map.json` - TMDB ID to matrix mapping
- `app/data/movie_metadata.json` - Movie metadata
These files are automatically generated and don't need to be included in the repository.
## Environment Variables
Set these in your Space settings:
- `OPENAI_API_KEY`: Your OpenAI API key
- `TMDB_API_KEY`: Your TMDB API key
- `API_TOKEN`: Authentication token for API access
- `ENV`: Set to "prod" for production