Mohamed284 commited on
Commit
cb8e014
·
1 Parent(s): 6a9be58
Recommedation_System_Netflix.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
Recommedation_System_Netflix.wiki ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ = Netflix Content Recommendation System =
2
+
3
+ == Introduction ==
4
+ This wiki entry explores a sophisticated hybrid recommendation system for Netflix movies and TV shows. The system combines multiple advanced approaches to provide accurate and diverse content recommendations:
5
+
6
+ * Content-based filtering using TF-IDF vectorization
7
+ * Collaborative filtering based on user preferences
8
+ * Node representation learning for enhanced content understanding
9
+
10
+ == Methodology ==
11
+
12
+ === Content-Based Filtering ===
13
+ The content-based filtering approach utilizes TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to analyze content features:
14
+
15
+ ==== TF-IDF Vectorization ====
16
+ TF-IDF is a numerical statistic that reflects the importance of a word in a document collection:
17
+ * Term Frequency (TF): Measures how frequently a term appears in a document
18
+ * Inverse Document Frequency (IDF): Downweights terms that appear in many documents
19
+
20
+ Implementation details:
21
+ <syntaxhighlight lang="Python">
22
+ tfidf = TfidfVectorizer(stop_words='english')
23
+ tfidf_matrix = tfidf.fit_transform(df['combined_features'])
24
+ </syntaxhighlight>
25
+
26
+ ==== Cosine Similarity ====
27
+ Cosine similarity measures the similarity between two vectors by computing the cosine of the angle between them:
28
+ * Range: [-1, 1] where 1 means identical direction, 0 means orthogonal, -1 means opposite
29
+ * Used to compare TF-IDF vectors of different content items
30
+
31
+ <syntaxhighlight lang="Python">
32
+ cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
33
+ </syntaxhighlight>
34
+
35
+ === Collaborative Filtering ===
36
+ The collaborative filtering component employs matrix factorization techniques to:
37
+
38
+ ==== User-Item Matrix ====
39
+ * Creates a matrix of user ratings for items
40
+ * Handles sparsity through matrix factorization
41
+ * Simulates user behavior patterns
42
+
43
+ <syntaxhighlight lang="Python">
44
+ def create_user_item_matrix(df, n_users=1000):
45
+ np.random.seed(42)
46
+ n_items = len(df)
47
+ user_item_matrix = np.random.randint(0, 6, size=(n_users, n_items)) * \
48
+ (np.random.random((n_users, n_items)) > 0.8)
49
+ return user_item_matrix
50
+ </syntaxhighlight>
51
+
52
+ ==== Matrix Factorization ====
53
+ * Singular Value Decomposition (SVD) for dimensionality reduction
54
+ * Captures latent features in user-item interactions
55
+ * Predicts missing ratings
56
+
57
+ <syntaxhighlight lang="Python">
58
+ def matrix_factorization(ratings, n_factors=50):
59
+ user_ratings_mean = np.mean(ratings, axis=1)
60
+ ratings_norm = ratings - user_ratings_mean.reshape(-1, 1)
61
+ U, sigma, Vt = svds(ratings_norm, k=n_factors)
62
+ predicted_ratings = np.dot(np.dot(U, np.diag(sigma)), Vt) + \
63
+ user_ratings_mean.reshape(-1, 1)
64
+ return predicted_ratings
65
+ </syntaxhighlight>
66
+
67
+ === Node Representation Learning ===
68
+ Implements graph-based learning using Node2Vec:
69
+
70
+ ==== Content Graph Creation ====
71
+ * Builds a graph representing content relationships
72
+ * Nodes represent movies/shows and genres
73
+ * Edges represent content-genre associations
74
+
75
+ <syntaxhighlight lang="Python">
76
+ def create_content_graph(df):
77
+ G = nx.Graph()
78
+
79
+ # Pre-process genres
80
+ genre_dict = {}
81
+ for idx, row in df.iterrows():
82
+ if isinstance(row['listed_in'], str):
83
+ genres = [g.strip() for g in row['listed_in'].split(',')]
84
+ genre_dict[idx] = genres
85
+
86
+ # Add unique genres as nodes
87
+ for genre in genres:
88
+ if not G.has_node(genre):
89
+ G.add_node(genre, type='genre')
90
+ return G
91
+ </syntaxhighlight>
92
+
93
+ ==== Node2Vec Algorithm ====
94
+ * Random walk-based approach for learning node embeddings
95
+ * Preserves network neighborhood information
96
+ * Parameters:
97
+ ** Dimensions: 32 (embedding size)
98
+ ** Walk length: 10 (steps per walk)
99
+ ** Number of walks: 50 (walks per node)
100
+
101
+ === Hybrid Recommendation Function ===
102
+ Combines multiple recommendation approaches:
103
+
104
+ ==== Weighted Combination ====
105
+ * Content similarity: 70% weight
106
+ * Node embeddings: 30% weight
107
+ * Adaptive weighting based on availability
108
+
109
+ <syntaxhighlight lang="Python">
110
+ def get_hybrid_recommendations(query, cosine_sim, df, n_recommendations=10):
111
+ """
112
+ Get hybrid recommendations based on content similarity and node embeddings.
113
+
114
+ Args:
115
+ query (str): Title or description to base recommendations on
116
+ cosine_sim (np.ndarray): Pre-computed cosine similarity matrix
117
+ df (pd.DataFrame): DataFrame containing Netflix content
118
+ """
119
+ </syntaxhighlight>
120
+
121
+ == Results Visualization ==
122
+
123
+ === Recommendation Scores Bar Chart ===
124
+ Visualization of top 10 recommendations with their similarity scores:
125
+
126
+ [[File:top_10_recommendations.png|thumb|200px|center|Top 10 Recommendations: Bar chart visualization showing similarity scores for the most relevant content recommendations based on the hybrid recommendation system.]]
127
+
128
+ === Network Analysis ===
129
+ The recommendation network analysis reveals:
130
+
131
+ * Number of recommended items: 10
132
+ * Number of connections: 45
133
+ * Network density: 1.000
134
+
135
+ === Recommendation Network Graph ===
136
+ Visualization of content relationships and similarities:
137
+
138
+ [[File:recommendation_network.png|thumb|200px|center|Recommendation Network: Graph visualization depicting content relationships, with nodes representing recommended items and edges showing content similarities between them.]]
139
+
140
+ The network graph shows:
141
+ * Nodes: Recommended content items
142
+ * Node size: Recommendation score
143
+ * Edges: Content similarities
144
+ * Edge thickness: Similarity strength
netflix_titles.csv ADDED
The diff for this file is too large to render. See raw diff
 
recommendation_network.png ADDED
top_10_recommendations.png ADDED