ChernovAndrei commited on
Commit
a422c4e
·
1 Parent(s): 24a293d

commit the app

Browse files
README.md CHANGED
@@ -1,14 +1,83 @@
1
- ---
2
- title: RecoFM
3
- emoji:
4
- colorFrom: purple
5
- colorTo: blue
6
- sdk: gradio
7
- sdk_version: 5.33.0
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- short_description: Zero-shot RecSys, which natively supports user intention
12
- ---
13
-
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Movie Recommender System
2
+ **Tag:** `agent-demo-track`
3
+ A hybrid movie recommender system that combines collaborative filtering, language model embeddings, and graph convolutional networks to provide personalized movie recommendations.
4
+
5
+ ## Features
6
+
7
+ - **Dual Embedding Types:**
8
+ - Pure Language Model (LLM) embeddings from Mistral AI
9
+ - Graph-enhanced embeddings (LLM + GCL) that combine language understanding with user interaction patterns
10
+ - **Hybrid Input:**
11
+ - Select up to 5 movies you've enjoyed
12
+ - Describe what kind of movie you're looking for in natural language
13
+ - Adjust the weight (α) between your movie selections and text description
14
+ - **Rich Results:**
15
+ - Get up to 20 personalized recommendations
16
+ - View similarity scores for each recommendation
17
+ - Search through a database of over 100,000 movies
18
+
19
+ ## Requirements
20
+
21
+ 1. Python 3.8+
22
+ 2. Virtual environment (recommended)
23
+ 3. Mistral AI API key (get one at https://console.mistral.ai/)
24
+
25
+ Install the required packages:
26
+
27
+ ```bash
28
+ pip install -r requirements.txt
29
+ ```
30
+
31
+ ## Environment Setup
32
+
33
+ 1. Create a `.env` file in the project root:
34
+ ```bash
35
+ MISTRAL_API_KEY=your_api_key_here
36
+ ```
37
+
38
+ 2. Ensure you have the necessary data files in the `amazon_movies_2023` directory:
39
+ - `title_embeddings.npz`: Movie title embeddings from Mistral AI
40
+ - `gcl_embeddings.npz`: Graph-enhanced embeddings
41
+ - `title_embeddings_mapping.csv`: Movie metadata mapping
42
+
43
+ ## Usage
44
+
45
+ 1. Activate your virtual environment:
46
+ ```bash
47
+ source venv/bin/activate # On Unix/macOS
48
+ ```
49
+
50
+ 2. Run the recommender app:
51
+ ```bash
52
+ python movie_recommender_app.py
53
+ ```
54
+
55
+ 3. Open your browser to the local URL shown in the terminal (typically http://127.0.0.1:7860)
56
+
57
+ ## How It Works
58
+
59
+ 1. **Movie Selection:**
60
+ - Search and select up to 5 movies you've enjoyed
61
+ - The system uses these as a baseline for your taste
62
+
63
+ 2. **Text Preferences:**
64
+ - Describe what you're looking for (e.g., "A thrilling sci-fi movie with deep philosophical themes")
65
+ - Your description is converted to embeddings using Mistral AI
66
+
67
+ 3. **Preference Weighting:**
68
+ - Use the α slider to balance between your selected movies and text description
69
+ - α = 0: Only use movie history
70
+ - α = 1: Only use text description
71
+ - Values in between combine both signals
72
+
73
+ 4. **Embedding Types:**
74
+ - LLM: Pure language model embeddings for semantic understanding
75
+ - LLM + GCL: Graph-enhanced embeddings that also consider user interaction patterns
76
+
77
+ ## Data Processing
78
+
79
+ For information about the dataset processing pipeline, see [DATA_PROCESSING.md](DATA_PROCESSING.md)
80
+
81
+ ## Contributing
82
+
83
+ Feel free to open issues or submit pull requests with improvements!
amazon_movies_2023/gcl_embeddings.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:957c1b970c9d8371da883523871c956593e81205a499c6962696df545806f6d6
3
+ size 580096202
amazon_movies_2023/title_embeddings.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d134b5950985ee7009a30f370d7b3b281351893d4d440ec5131bc759cf219ab
3
+ size 173284697
amazon_movies_2023/title_embeddings_mapping.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20e2e163e9591dcd7eaf13e72b7c0666e41c0734f303599113c161bb7c9f0bdc
3
+ size 3386200
app.py ADDED
@@ -0,0 +1,531 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import numpy as np
3
+ from sklearn.preprocessing import StandardScaler
4
+ import pandas as pd
5
+ import os
6
+ import zlib
7
+ from typing import Dict, List, Tuple, Optional, Literal
8
+ from langchain_mistralai import MistralAIEmbeddings
9
+ from langchain_core.embeddings import Embeddings
10
+ import os
11
+ from dotenv import load_dotenv
12
+ from ranking_agent import rank_with_ai
13
+ from scipy.sparse import load_npz
14
+ from rapidfuzz import process, fuzz
15
+ import re
16
+ from sklearn.metrics.pairwise import cosine_similarity
17
+
18
+ load_dotenv()
19
+
20
+ class MovieRecommender:
21
+ def __init__(self, data_dir: str = "amazon_movies_2023"):
22
+ self.data_dir = data_dir
23
+ self.embeddings = MistralAIEmbeddings(
24
+ model="mistral-embed",
25
+ mistral_api_key=os.getenv("MISTRAL_API_KEY")
26
+ )
27
+ # Load both types of embeddings
28
+ self.load_embeddings()
29
+
30
+ def load_embeddings(self) -> None:
31
+ # Load LLM embeddings
32
+ llm_embeddings_path = os.path.join(self.data_dir, "title_embeddings.npz")
33
+ try:
34
+ llm_data = np.load(llm_embeddings_path)
35
+ self.llm_embeddings = llm_data['embeddings']
36
+ self.llm_item_ids = llm_data['item_ids'].astype(str) # Ensure string type
37
+ print(f"Loaded LLM embeddings with shape: {self.llm_embeddings.shape}")
38
+ print(f"Number of LLM item IDs: {len(self.llm_item_ids)}")
39
+ except (IOError, zlib.error) as e:
40
+ raise RuntimeError(
41
+ f"Error loading LLM embeddings file: {str(e)}\n"
42
+ "The embeddings file appears to be corrupted or invalid."
43
+ )
44
+
45
+ # Load GCL embeddings
46
+ gcl_embeddings_path = os.path.join(self.data_dir, "gcl_embeddings.npz")
47
+ try:
48
+ gcl_data = np.load(gcl_embeddings_path)
49
+ self.gcl_embeddings = gcl_data['embeddings']
50
+ self.gcl_item_ids = gcl_data['item_ids'].astype(str) # Ensure string type
51
+ print(f"Loaded GCL embeddings with shape: {self.gcl_embeddings.shape}")
52
+ print(f"Number of GCL item IDs: {len(self.gcl_item_ids)}")
53
+ except (IOError, zlib.error) as e:
54
+ raise RuntimeError(
55
+ f"Error loading GCL embeddings file: {str(e)}\n"
56
+ "Please run gcl_embeddings.py first to generate GCL embeddings."
57
+ )
58
+
59
+ # Load movie mapping
60
+ mapping_path = os.path.join(self.data_dir, "title_embeddings_mapping.csv")
61
+ self.movies_df = pd.read_csv(mapping_path)
62
+ self.movies_df['item_id'] = self.movies_df['item_id'].astype(str) # Ensure string type
63
+
64
+ # Create standardized embeddings for both types
65
+ scaler = StandardScaler()
66
+ self.llm_embeddings = scaler.fit_transform(self.llm_embeddings)
67
+ self.gcl_embeddings = scaler.fit_transform(self.gcl_embeddings)
68
+
69
+ # Create item_id to index mappings for both types
70
+ self.llm_id_to_idx = {str(item_id): idx for idx, item_id in enumerate(self.llm_item_ids)}
71
+ self.gcl_id_to_idx = {str(item_id): idx for idx, item_id in enumerate(self.gcl_item_ids)}
72
+
73
+ # Create title to id mapping for search
74
+ self.title_to_id = dict(zip(self.movies_df['title'], self.movies_df['item_id']))
75
+
76
+ # Store all titles for search
77
+ self.all_titles = self.movies_df['title'].tolist()
78
+
79
+ print(f"Number of movies in mapping: {len(self.movies_df)}")
80
+ print(f"Number of titles with LLM embeddings: {len(set(self.llm_id_to_idx.keys()) & set(self.title_to_id.values()))}")
81
+ print(f"Number of titles with GCL embeddings: {len(set(self.gcl_id_to_idx.keys()) & set(self.title_to_id.values()))}")
82
+
83
+ # Pre-process titles for fuzzy matching
84
+ self.clean_titles = {self.clean_title_for_comparison(title): title for title in self.title_to_id.keys()}
85
+
86
+ def clean_title_for_comparison(self, title):
87
+ """Clean title for comparison purposes"""
88
+ # Remove special characters and extra spaces
89
+ title = re.sub(r'[^\w\s]', '', str(title))
90
+ # Convert to lowercase and strip
91
+ return ' '.join(title.lower().split())
92
+
93
+ def search_movies(self, query: str) -> List[str]:
94
+ if not query:
95
+ return [] # Return empty if no query to avoid overwhelming UI
96
+
97
+ clean_query = self.clean_title_for_comparison(query)
98
+ # Use rapidfuzz to find matches across entire dataset
99
+ matches = process.extract(
100
+ clean_query,
101
+ self.clean_titles.keys(),
102
+ scorer=fuzz.WRatio, # WRatio works well for movie titles
103
+ limit=None, # No limit - show all matches
104
+ score_cutoff=60 # Only return matches with score > 60
105
+ )
106
+
107
+ # Convert matches back to original titles
108
+ return [self.clean_titles[match[0]] for match in matches]
109
+
110
+ def get_text_embedding(self, text: str) -> np.ndarray:
111
+ """Get embedding for text using LangChain Mistral embeddings"""
112
+ try:
113
+ embedding = self.embeddings.embed_query(text)
114
+ # Convert embedding to numpy array
115
+ embedding = np.array(embedding, dtype=np.float32)
116
+ # Normalize the embedding
117
+ if np.any(embedding): # Only normalize if not all zeros
118
+ embedding = embedding / np.linalg.norm(embedding)
119
+ return embedding
120
+ except Exception as e:
121
+ print(f"Error getting embedding from Mistral API: {str(e)}")
122
+ return None
123
+
124
+ def get_recommendations(self, selected_movies: List[str], embedding_type: str = "LLM + GCL", user_preferences: str = "", alpha: float = 0.5) -> str:
125
+ """
126
+ Get recommendations using proper embedding aggregation:
127
+ - e_h: embedding from user history (selected movies)
128
+ - e_u: embedding from user preferences (text)
129
+ - Combined: alpha * e_u + (1-alpha) * e_h
130
+ """
131
+ if not selected_movies and not user_preferences:
132
+ return "Please select some movies or provide preferences."
133
+
134
+ # Choose embeddings based on type
135
+ if embedding_type == "LLM + GCL":
136
+ embeddings = self.gcl_embeddings
137
+ id_to_idx = self.gcl_id_to_idx
138
+ else:
139
+ embeddings = self.llm_embeddings
140
+ id_to_idx = self.llm_id_to_idx
141
+
142
+ user_profile = None
143
+
144
+ # Get embedding from user history (e_h)
145
+ e_h = None
146
+ if selected_movies:
147
+ movie_ids = [self.title_to_id[title] for title in selected_movies if title in self.title_to_id]
148
+ if movie_ids:
149
+ selected_embeddings = []
150
+ for movie_id in movie_ids:
151
+ if movie_id in id_to_idx:
152
+ idx = id_to_idx[movie_id]
153
+ selected_embeddings.append(embeddings[idx])
154
+
155
+ if selected_embeddings:
156
+ e_h = np.mean(selected_embeddings, axis=0)
157
+
158
+ # Get embedding from user preferences (e_u)
159
+ e_u = None
160
+ if user_preferences.strip():
161
+ e_u = self.get_text_embedding(user_preferences)
162
+
163
+ # Apply aggregation algorithm
164
+ if e_h is not None and e_u is not None:
165
+ # Both available: alpha * e_u + (1-alpha) * e_h
166
+ user_profile = alpha * e_u + (1 - alpha) * e_h
167
+ print(f"Using combined embedding: α={alpha} (preferences weight)")
168
+ elif e_u is not None:
169
+ # Only preferences available
170
+ user_profile = e_u
171
+ print("Using preferences-only embedding")
172
+ elif e_h is not None:
173
+ # Only history available
174
+ user_profile = e_h
175
+ print("Using history-only embedding")
176
+ else:
177
+ return "Could not create user profile from provided input."
178
+
179
+ # Calculate similarity with all movies
180
+ # Normalize user profile and embeddings for proper cosine similarity
181
+ user_profile_norm = user_profile / np.linalg.norm(user_profile)
182
+ embeddings_norm = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
183
+
184
+ # Calculate cosine similarity (normalized dot product)
185
+ similarities = np.dot(embeddings_norm, user_profile_norm)
186
+
187
+ print(f"Similarity range: {similarities.min():.3f} to {similarities.max():.3f}")
188
+
189
+ # Get top 100 most similar movies
190
+ top_indices = np.argsort(similarities)[-100:][::-1]
191
+
192
+ # Filter out selected movies and create recommendations
193
+ seen_titles = set(selected_movies) if selected_movies else set()
194
+ seen_clean_titles = set(self.clean_title_for_comparison(title) for title in seen_titles)
195
+ final_recommendations = []
196
+
197
+ # Get reverse mapping for the chosen embedding type
198
+ if embedding_type == "LLM + GCL":
199
+ idx_to_id = {idx: item_id for item_id, idx in self.gcl_id_to_idx.items()}
200
+ else:
201
+ idx_to_id = {idx: item_id for item_id, idx in self.llm_id_to_idx.items()}
202
+
203
+ for idx in top_indices:
204
+ if idx not in idx_to_id:
205
+ continue
206
+
207
+ item_id = idx_to_id[idx]
208
+
209
+ # Find the title for this item_id
210
+ title = None
211
+ for t, id_ in self.title_to_id.items():
212
+ if id_ == item_id:
213
+ title = t
214
+ break
215
+
216
+ if not title:
217
+ continue
218
+
219
+ clean_title = self.clean_title_for_comparison(title)
220
+
221
+ # Skip if exact title is in seen titles
222
+ if title in seen_titles:
223
+ continue
224
+
225
+ # Skip if clean version of title is in seen titles
226
+ if clean_title in seen_clean_titles:
227
+ continue
228
+
229
+ # Skip collections/trilogies if user has seen any part
230
+ is_collection = False
231
+ for seen_title in seen_titles:
232
+ seen_clean = self.clean_title_for_comparison(seen_title)
233
+ if seen_clean in clean_title or clean_title in seen_clean:
234
+ if any(marker in title.lower() for marker in ['collection', 'trilogy', 'series', 'complete']):
235
+ is_collection = True
236
+ break
237
+ if is_collection:
238
+ continue
239
+
240
+ # Check if this is a duplicate of already recommended movie
241
+ is_duplicate = any(
242
+ fuzz.ratio(clean_title, self.clean_title_for_comparison(rec[0])) > 90
243
+ for rec in final_recommendations
244
+ )
245
+ if is_duplicate:
246
+ continue
247
+
248
+ # Add with similarity score
249
+ final_recommendations.append((title, similarities[idx]))
250
+ if len(final_recommendations) >= 100:
251
+ break
252
+
253
+ if not final_recommendations:
254
+ return "No recommendations found based on your input."
255
+
256
+ return final_recommendations[:100] # Return top 100 for ranking agent
257
+
258
+ def create_interface():
259
+ try:
260
+ recommender = MovieRecommender()
261
+ except Exception as e:
262
+ print(f"Error initializing recommender: {str(e)}")
263
+ return None
264
+
265
+ with gr.Blocks() as iface:
266
+ gr.Markdown(
267
+ """
268
+ # Movie Recommender
269
+ Get personalized movie recommendations based on your taste and preferences!
270
+
271
+ **How to use:**
272
+ 1. Search and select movies you've enjoyed (no limit!)
273
+ 2. Describe what kind of movie you're looking for (optional)
274
+ 3. Adjust the preference weight (α) to balance between your description and movie history
275
+ 4. Get personalized recommendations
276
+ """
277
+ )
278
+
279
+ selected_movies = gr.State([])
280
+ retrieval_results = gr.State([]) # Store retrieval results for ranking
281
+
282
+ with gr.Row():
283
+ with gr.Column():
284
+ # Movie search and selection
285
+ movie_search_input = gr.Textbox(
286
+ label="Search movies",
287
+ placeholder="Type to search...",
288
+ interactive=True,
289
+ every=True
290
+ )
291
+
292
+ # Show search results as a list of clickable buttons
293
+ search_results = gr.Radio(
294
+ choices=[],
295
+ label="Search Results",
296
+ interactive=True,
297
+ visible=True
298
+ )
299
+
300
+ # Display selected movies with functional red cross buttons
301
+ with gr.Column(elem_id="selected_movies_container") as selected_movies_container:
302
+ selected_display = gr.HTML(
303
+ label="Your Selected Movies",
304
+ value="<p><i>No movies selected yet</i></p>"
305
+ )
306
+
307
+ # Individual delete buttons (simpler approach)
308
+ delete_buttons = []
309
+ for i in range(20): # Support up to 20 movies
310
+ btn = gr.Button(f"× Remove Movie {i+1}", visible=False, size="sm", variant="secondary")
311
+ delete_buttons.append(btn)
312
+
313
+ # Clear all button
314
+ clear_btn = gr.Button("Clear All", size="sm", variant="secondary")
315
+
316
+ # User preferences text field
317
+ user_preferences = gr.Textbox(
318
+ label="Describe what kind of movie you're looking for",
319
+ placeholder="E.g., 'A thrilling sci-fi movie with deep philosophical themes'",
320
+ lines=3
321
+ )
322
+
323
+ # Alpha slider
324
+ alpha = gr.Slider(
325
+ minimum=0,
326
+ maximum=1,
327
+ value=0.5,
328
+ step=0.1,
329
+ label="Preference Weight (α)",
330
+ info="0: Use only movie history, 1: Use only your description"
331
+ )
332
+
333
+ # Embedding type selection (defaulting to GCL)
334
+ embedding_type = gr.Radio(
335
+ choices=["LLM + GCL", "LLM"],
336
+ value="LLM + GCL",
337
+ label="Embedding Type",
338
+ info="Choose between pure language model embeddings (LLM) or graph-enhanced embeddings (LLM + GCL)"
339
+ )
340
+
341
+ # Get recommendations button
342
+ recommend_btn = gr.Button("Get Recommendations", variant="primary")
343
+
344
+ with gr.Column():
345
+ # Display recommendations with streaming
346
+ recommendations = gr.Markdown(
347
+ label="Your Personalized Recommendations",
348
+ value="Recommendations will appear here"
349
+ )
350
+
351
+ def update_search_results(query):
352
+ """Update search results based on input"""
353
+ if not query or len(query.strip()) < 2:
354
+ return gr.Radio(choices=[], visible=False)
355
+
356
+ matches = recommender.search_movies(query)
357
+ # Limit display to first 20 for UI performance
358
+ display_matches = matches[:20] if len(matches) > 20 else matches
359
+
360
+ if display_matches:
361
+ return gr.Radio(choices=display_matches, visible=True)
362
+ else:
363
+ return gr.Radio(choices=[], visible=False)
364
+
365
+ def format_selected_movies_display(movies):
366
+ """Format selected movies with remove buttons on same line"""
367
+ if not movies:
368
+ return "<p><i>No movies selected yet</i></p>"
369
+
370
+ html_items = []
371
+ for i, movie in enumerate(movies):
372
+ html_items.append(f"""
373
+ <div style="display: flex; align-items: center; justify-content: space-between;
374
+ padding: 8px 12px; margin: 4px 0; background-color: #f8f9fa;
375
+ border-radius: 6px; border-left: 3px solid #007bff;">
376
+ <span style="flex-grow: 1; font-size: 14px; margin-right: 10px;">{i+1}. {movie}</span>
377
+ </div>
378
+ """)
379
+
380
+ return f"<div>{''.join(html_items)}</div>"
381
+
382
+ def update_delete_buttons_visibility(movies):
383
+ """Update visibility and labels of delete buttons"""
384
+ button_updates = []
385
+ for i in range(20): # Support up to 20 movies
386
+ if i < len(movies):
387
+ movie_name = movies[i][:40] + ("..." if len(movies[i]) > 40 else "")
388
+ button_updates.append(gr.Button(f"🗑️ {movie_name}", visible=True, size="sm", variant="secondary"))
389
+ else:
390
+ button_updates.append(gr.Button(f"× Remove Movie {i+1}", visible=False, size="sm", variant="secondary"))
391
+
392
+ return button_updates
393
+
394
+ def delete_movie_by_index(index, current_movies):
395
+ """Delete movie at specific index"""
396
+ if not current_movies or index >= len(current_movies):
397
+ return current_movies, format_selected_movies_display(current_movies)
398
+
399
+ current_movies.pop(index)
400
+ return current_movies, format_selected_movies_display(current_movies)
401
+
402
+ def handle_movie_selection(selected_movie, current_movies):
403
+ """Handle movie selection from radio buttons"""
404
+ if not selected_movie:
405
+ return [current_movies, format_selected_movies_display(current_movies)] + update_delete_buttons_visibility(current_movies)
406
+
407
+ # Check if it's a movie title (exists in our database)
408
+ if selected_movie in recommender.title_to_id:
409
+ # It's a movie selection - add it to the list
410
+ current_movies = current_movies or []
411
+ # Remove the 5-movie limit - users can now select as many as they want
412
+
413
+ if selected_movie not in current_movies:
414
+ current_movies.append(selected_movie)
415
+
416
+ return [current_movies, format_selected_movies_display(current_movies)] + update_delete_buttons_visibility(current_movies)
417
+ else:
418
+ # Not a movie from database
419
+ return [current_movies, format_selected_movies_display(current_movies)] + update_delete_buttons_visibility(current_movies)
420
+
421
+ def clear_all_movies():
422
+ """Clear all selected movies"""
423
+ empty_movies = []
424
+ return [empty_movies, "<p><i>No movies selected yet</i></p>"] + update_delete_buttons_visibility(empty_movies)
425
+
426
+ def get_recommendations(movies, emb_type, preferences, pref_weight):
427
+ """Get recommendations: retrieval phase only, then delegate to ranking_agent with streaming"""
428
+ if not movies and not preferences:
429
+ yield "Please select some movies or provide preferences"
430
+ return
431
+
432
+ try:
433
+ # RETRIEVAL PHASE: Get top 100 candidates using proper embedding aggregation
434
+ print(f"\n=== RETRIEVAL PHASE ===")
435
+ print(f"Selected movies: {movies}")
436
+ print(f"User preferences: '{preferences}'")
437
+ print(f"Alpha weight: {pref_weight}")
438
+ print(f"Embedding type: {emb_type}")
439
+
440
+ yield "🔍 Searching for similar movies..."
441
+
442
+ recommendations = recommender.get_recommendations(
443
+ selected_movies=movies,
444
+ embedding_type=emb_type,
445
+ user_preferences=preferences,
446
+ alpha=pref_weight
447
+ )
448
+
449
+ # Handle error cases
450
+ if isinstance(recommendations, str):
451
+ yield recommendations
452
+ return
453
+
454
+ # Print retrieval results
455
+ print(f"\nRETRIEVAL RESULTS: Found {len(recommendations)} candidates")
456
+ print("Top 100 from retrieval phase:")
457
+ for i, (title, score) in enumerate(recommendations[:100], 1):
458
+ print(f" {i:2d}. {title} (score: {score:.3f})")
459
+
460
+ # RERANKING + EXPLANATION PHASE: Delegate to ranking_agent with streaming
461
+ print(f"\n=== RERANKING PHASE ===")
462
+ print(f"Calling rank_with_ai with:")
463
+ print(f" - {len(recommendations)} recommendations")
464
+ print(f" - preferences: '{preferences}'")
465
+ print(f" - alpha: {pref_weight}")
466
+ print(f" - user_movies: {movies}")
467
+
468
+ yield "🤖 AI is ranking and explaining your recommendations..."
469
+
470
+ # Stream the responses from ranking agent
471
+ for partial_result in rank_with_ai(
472
+ recommendations=recommendations,
473
+ user_preferences=preferences,
474
+ alpha=pref_weight,
475
+ user_movies=movies
476
+ ):
477
+ yield partial_result
478
+
479
+ except Exception as e:
480
+ print(f"ERROR in get_recommendations: {str(e)}")
481
+ import traceback
482
+ traceback.print_exc()
483
+ yield f"Error getting recommendations: {str(e)}"
484
+
485
+ # Event handlers
486
+ movie_search_input.change(
487
+ fn=update_search_results,
488
+ inputs=movie_search_input,
489
+ outputs=search_results
490
+ )
491
+
492
+ search_results.change(
493
+ fn=handle_movie_selection,
494
+ inputs=[search_results, selected_movies],
495
+ outputs=[selected_movies, selected_display] + delete_buttons
496
+ )
497
+
498
+ # Add individual delete button handlers
499
+ for i, btn in enumerate(delete_buttons):
500
+ def make_delete_handler(btn_idx):
501
+ def delete_handler(current_movies):
502
+ updated_movies, updated_display = delete_movie_by_index(btn_idx, current_movies)
503
+ return [updated_movies, updated_display] + update_delete_buttons_visibility(updated_movies)
504
+ return delete_handler
505
+
506
+ btn.click(
507
+ fn=make_delete_handler(i),
508
+ inputs=[selected_movies],
509
+ outputs=[selected_movies, selected_display] + delete_buttons
510
+ )
511
+
512
+ clear_btn.click(
513
+ fn=clear_all_movies,
514
+ inputs=[],
515
+ outputs=[selected_movies, selected_display] + delete_buttons
516
+ )
517
+
518
+ recommend_btn.click(
519
+ fn=get_recommendations,
520
+ inputs=[selected_movies, embedding_type, user_preferences, alpha],
521
+ outputs=recommendations
522
+ )
523
+
524
+ return iface
525
+
526
+ if __name__ == "__main__":
527
+ iface = create_interface()
528
+ if iface is not None:
529
+ iface.launch()
530
+ else:
531
+ print("\nPlease fix the issues above and try again.")
ranking_agent.py ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Tuple, Dict
2
+ from langchain_core.prompts import ChatPromptTemplate
3
+ from langchain_mistralai.chat_models import ChatMistralAI
4
+ import os
5
+ from dotenv import load_dotenv
6
+
7
+ load_dotenv()
8
+
9
+ def create_ranking_chain():
10
+ """Create a ranking chain using new RunnableSequence format"""
11
+ prompt = ChatPromptTemplate.from_messages([
12
+ ("system", """You are a movie recommendation expert. Your task is to select the top 10 most relevant movies from a list of recommended movies and provide the final formatted output with brief explanations.
13
+
14
+ Rules:
15
+ 1. Always return exactly 10 movies
16
+ 2. Consider both relevance scores and how well each movie matches user preferences
17
+ 3. Pay attention to the alpha weighting parameter - it tells you how much to prioritize text preferences vs viewing history
18
+ 4. Return only movies from the provided list
19
+ 5. NEVER recommend movies that are already in the user's viewing history - these should be completely excluded
20
+ 6. Format each movie exactly as: **1. Movie Title**\n[Exactly 2 sentences explaining why this movie matches their taste]\n\n
21
+ 7. Number from 1 to 10, no additional text before or after"""),
22
+ ("user", """Given these movie recommendations with their relevance scores:
23
+ {movie_scores}
24
+
25
+ User preferences: {preferences}
26
+
27
+ User's viewing history (DO NOT RECOMMEND ANY OF THESE): {user_movies}
28
+
29
+ Alpha weighting: {alpha}
30
+ (α=0.0 means recommendations were based entirely on viewing history, α=1.0 means entirely on text preferences, α=0.5 means equal balance)
31
+
32
+ Select the 10 most relevant movies and provide the final formatted output with explanations. Format each as:
33
+ **1. Movie Title**
34
+ [Exactly 2 sentences explaining why this movie matches their taste based on the weighted combination of their preferences and history]
35
+
36
+ **2. Movie Title**
37
+ [Exactly 2 sentences explaining why this movie matches their taste based on the weighted combination of their preferences and history]
38
+
39
+ ...continue for all 10 movies.
40
+
41
+ Remember: NEVER include any movie from the user's viewing history in your recommendations.""")
42
+ ])
43
+
44
+ model = ChatMistralAI(
45
+ mistral_api_key=os.environ["MISTRAL_API_KEY"],
46
+ model="mistral-large-latest",
47
+ temperature=0.5,
48
+ max_tokens=1200,
49
+ streaming=True
50
+ )
51
+
52
+ return prompt | model
53
+
54
+
55
+
56
+ def rank_with_ai(recommendations: List[Tuple[str, float]], user_preferences: str = "", alpha: float = 0.5, user_movies: List[str] = None):
57
+ """
58
+ Complete reranking and explanation pipeline with streaming:
59
+ 1. Takes top 100 candidates from retrieval phase
60
+ 2. Reranks to top 10 using AI
61
+ 3. Generates explanations with streaming
62
+ 4. Yields partial formatted responses
63
+
64
+ Args:
65
+ recommendations: List of (movie_title, relevance_score) tuples from retrieval phase
66
+ user_preferences: User's textual preferences/description
67
+ alpha: Weighting parameter (0.0 = only history matters, 1.0 = only preferences matter)
68
+ user_movies: List of user's selected movies for context
69
+ """
70
+ print(f"\n=== RANKING_AGENT DEBUG ===")
71
+ print(f"Received {len(recommendations) if recommendations else 0} recommendations")
72
+ print(f"User preferences: '{user_preferences}' (length: {len(user_preferences) if user_preferences else 0})")
73
+ print(f"Alpha: {alpha}")
74
+ print(f"User movies: {user_movies}")
75
+
76
+ if not recommendations:
77
+ yield "No recommendations available."
78
+ return
79
+
80
+ # Take only top 100 recommendations if more are provided
81
+ recommendations = recommendations[:100]
82
+
83
+ try:
84
+ # Format movie scores for ranking
85
+ movie_scores = "\n".join(
86
+ f"{title} (relevance: {score:.3f})"
87
+ for title, score in recommendations
88
+ )
89
+
90
+ # Start with header
91
+ result_header = "## 🎬 Your Personalized Movie Recommendations\n\n"
92
+
93
+ if user_movies and user_preferences:
94
+ result_header += f"*Based on α={alpha} weighting: {int((1-alpha)*100)}% your viewing history + {int(alpha*100)}% your preferences*\n\n"
95
+ elif user_preferences:
96
+ result_header += f"*Based entirely on your preferences: \"{user_preferences}\"*\n\n"
97
+ elif user_movies:
98
+ result_header += f"*Based entirely on your viewing history*\n\n"
99
+
100
+ result_header += "---\n\n"
101
+ yield result_header
102
+
103
+ # Single chain that does both ranking and explanation
104
+ ranking_chain = create_ranking_chain()
105
+ print("Calling unified ranking + explanation chain...")
106
+
107
+ # Stream the response directly
108
+ accumulated_text = result_header
109
+ for chunk in ranking_chain.stream({
110
+ "movie_scores": movie_scores,
111
+ "preferences": user_preferences if user_preferences else "No specific preferences provided",
112
+ "user_movies": ", ".join(user_movies) if user_movies else "None",
113
+ "alpha": alpha
114
+ }):
115
+ if chunk.content:
116
+ accumulated_text += chunk.content
117
+ yield accumulated_text
118
+
119
+ except Exception as e:
120
+ print(f"ERROR in rank_with_ai: {str(e)}")
121
+ import traceback
122
+ traceback.print_exc()
123
+ # Fallback to simple format
124
+ result = "## 🎬 Your Recommendations\n\n"
125
+ for i, (title, score) in enumerate(recommendations[:10], 1):
126
+ result += f"**{i}. {title}**\n"
127
+ result += f"*Similarity: {score:.3f}*\n\n"
128
+ yield result
requirements.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ flask==2.0.1
2
+ numpy>=1.21.0
3
+ pandas>=1.3.0
4
+ scipy>=1.7.1
5
+ rapidfuzz>=3.0.0
6
+ requests>=2.31.0
7
+ tqdm>=4.66.1
8
+ scikit-learn>=1.0.0
9
+ datasets>=2.17.0
10
+ python-dotenv>=1.0.1
11
+ langchain>=0.1.9
12
+ langchain-core>=0.1.27
13
+ langchain-mistralai>=0.0.5
14
+ gradio>=4.19.2