File size: 2,916 Bytes
66fef64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b1c879a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66fef64
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
title: Karl Movie Vector Backend
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
---

# Karl Movie Vector Backend

FastAPI backend for semantic movie recommendations using FAISS and OpenAI embeddings. Powers intelligent movie discovery with geometric subspace algorithms.

## Features

- Semantic movie search using OpenAI embeddings
- FAISS-powered vector similarity search
- Geometric subspace algorithms for multi-movie preferences
- ~150ms response time on CPU
- RESTful API with Bearer token authentication

## API Usage

```bash
curl -X POST "https://yonnel-karl-movie-vector-backend.hf.space/explore" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "liked_ids": [550, 680],
    "disliked_ids": [],
    "top_k": 100
  }'
```

# Karl Movie Vector Backend - Hugging Face Deployment

This FastAPI application provides movie recommendations using vector similarity search.

## πŸš€ Automatic Setup

This application will automatically build its movie index on first startup. The process includes:

1. **Data Collection**: Fetches movie data from TMDB API
2. **Embedding Generation**: Creates vector embeddings using OpenAI API  
3. **Index Building**: Builds FAISS index for fast similarity search
4. **API Startup**: Launches the FastAPI service

⏱️ **First startup may take 3-5 minutes** to build the index.

## πŸ”§ Required Environment Variables

Configure these in your Hugging Face Space settings:

### Essential APIs
- `OPENAI_API_KEY`: Your OpenAI API key for generating embeddings
- `TMDB_API_KEY`: Your TMDB API key for fetching movie data

### Optional Configuration  
- `API_TOKEN`: Token for API authentication (optional)
- `LOG_LEVEL`: Logging level (default: INFO)

## πŸ“‘ API Endpoints

- `GET /health` - Health check
- `POST /search` - Search for similar movies
- `GET /movie/{movie_id}` - Get movie details

## πŸ—οΈ Technical Details

- **Framework**: FastAPI
- **Vector Search**: FAISS
- **Embeddings**: OpenAI text-embedding-3-small
- **Movie Data**: TMDB (The Movie Database)
- **Container**: Docker

## πŸ”„ Rebuilding Index

To rebuild the movie index (e.g., to get newer movies):
1. Delete the Space's persistent storage
2. Restart the Space
3. The index will rebuild automatically on startup

## πŸ“¦ Data Files Generated

The application creates these files on startup:
- `app/data/faiss.index` - FAISS vector search index
- `app/data/movies.npy` - Movie embeddings matrix
- `app/data/id_map.json` - TMDB ID to matrix mapping
- `app/data/movie_metadata.json` - Movie metadata

These files are automatically generated and don't need to be included in the repository.

## Environment Variables

Set these in your Space settings:
- `OPENAI_API_KEY`: Your OpenAI API key
- `TMDB_API_KEY`: Your TMDB API key  
- `API_TOKEN`: Authentication token for API access
- `ENV`: Set to "prod" for production