Commit
·
cb8e014
1
Parent(s):
6a9be58
- Recommedation_System_Netflix.ipynb +0 -0
- Recommedation_System_Netflix.wiki +144 -0
- netflix_titles.csv +0 -0
- recommendation_network.png +0 -0
- top_10_recommendations.png +0 -0
Recommedation_System_Netflix.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Recommedation_System_Netflix.wiki
ADDED
@@ -0,0 +1,144 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
= Netflix Content Recommendation System =
|
2 |
+
|
3 |
+
== Introduction ==
|
4 |
+
This wiki entry explores a sophisticated hybrid recommendation system for Netflix movies and TV shows. The system combines multiple advanced approaches to provide accurate and diverse content recommendations:
|
5 |
+
|
6 |
+
* Content-based filtering using TF-IDF vectorization
|
7 |
+
* Collaborative filtering based on user preferences
|
8 |
+
* Node representation learning for enhanced content understanding
|
9 |
+
|
10 |
+
== Methodology ==
|
11 |
+
|
12 |
+
=== Content-Based Filtering ===
|
13 |
+
The content-based filtering approach utilizes TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to analyze content features:
|
14 |
+
|
15 |
+
==== TF-IDF Vectorization ====
|
16 |
+
TF-IDF is a numerical statistic that reflects the importance of a word in a document collection:
|
17 |
+
* Term Frequency (TF): Measures how frequently a term appears in a document
|
18 |
+
* Inverse Document Frequency (IDF): Downweights terms that appear in many documents
|
19 |
+
|
20 |
+
Implementation details:
|
21 |
+
<syntaxhighlight lang="Python">
|
22 |
+
tfidf = TfidfVectorizer(stop_words='english')
|
23 |
+
tfidf_matrix = tfidf.fit_transform(df['combined_features'])
|
24 |
+
</syntaxhighlight>
|
25 |
+
|
26 |
+
==== Cosine Similarity ====
|
27 |
+
Cosine similarity measures the similarity between two vectors by computing the cosine of the angle between them:
|
28 |
+
* Range: [-1, 1] where 1 means identical direction, 0 means orthogonal, -1 means opposite
|
29 |
+
* Used to compare TF-IDF vectors of different content items
|
30 |
+
|
31 |
+
<syntaxhighlight lang="Python">
|
32 |
+
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
|
33 |
+
</syntaxhighlight>
|
34 |
+
|
35 |
+
=== Collaborative Filtering ===
|
36 |
+
The collaborative filtering component employs matrix factorization techniques to:
|
37 |
+
|
38 |
+
==== User-Item Matrix ====
|
39 |
+
* Creates a matrix of user ratings for items
|
40 |
+
* Handles sparsity through matrix factorization
|
41 |
+
* Simulates user behavior patterns
|
42 |
+
|
43 |
+
<syntaxhighlight lang="Python">
|
44 |
+
def create_user_item_matrix(df, n_users=1000):
|
45 |
+
np.random.seed(42)
|
46 |
+
n_items = len(df)
|
47 |
+
user_item_matrix = np.random.randint(0, 6, size=(n_users, n_items)) * \
|
48 |
+
(np.random.random((n_users, n_items)) > 0.8)
|
49 |
+
return user_item_matrix
|
50 |
+
</syntaxhighlight>
|
51 |
+
|
52 |
+
==== Matrix Factorization ====
|
53 |
+
* Singular Value Decomposition (SVD) for dimensionality reduction
|
54 |
+
* Captures latent features in user-item interactions
|
55 |
+
* Predicts missing ratings
|
56 |
+
|
57 |
+
<syntaxhighlight lang="Python">
|
58 |
+
def matrix_factorization(ratings, n_factors=50):
|
59 |
+
user_ratings_mean = np.mean(ratings, axis=1)
|
60 |
+
ratings_norm = ratings - user_ratings_mean.reshape(-1, 1)
|
61 |
+
U, sigma, Vt = svds(ratings_norm, k=n_factors)
|
62 |
+
predicted_ratings = np.dot(np.dot(U, np.diag(sigma)), Vt) + \
|
63 |
+
user_ratings_mean.reshape(-1, 1)
|
64 |
+
return predicted_ratings
|
65 |
+
</syntaxhighlight>
|
66 |
+
|
67 |
+
=== Node Representation Learning ===
|
68 |
+
Implements graph-based learning using Node2Vec:
|
69 |
+
|
70 |
+
==== Content Graph Creation ====
|
71 |
+
* Builds a graph representing content relationships
|
72 |
+
* Nodes represent movies/shows and genres
|
73 |
+
* Edges represent content-genre associations
|
74 |
+
|
75 |
+
<syntaxhighlight lang="Python">
|
76 |
+
def create_content_graph(df):
|
77 |
+
G = nx.Graph()
|
78 |
+
|
79 |
+
# Pre-process genres
|
80 |
+
genre_dict = {}
|
81 |
+
for idx, row in df.iterrows():
|
82 |
+
if isinstance(row['listed_in'], str):
|
83 |
+
genres = [g.strip() for g in row['listed_in'].split(',')]
|
84 |
+
genre_dict[idx] = genres
|
85 |
+
|
86 |
+
# Add unique genres as nodes
|
87 |
+
for genre in genres:
|
88 |
+
if not G.has_node(genre):
|
89 |
+
G.add_node(genre, type='genre')
|
90 |
+
return G
|
91 |
+
</syntaxhighlight>
|
92 |
+
|
93 |
+
==== Node2Vec Algorithm ====
|
94 |
+
* Random walk-based approach for learning node embeddings
|
95 |
+
* Preserves network neighborhood information
|
96 |
+
* Parameters:
|
97 |
+
** Dimensions: 32 (embedding size)
|
98 |
+
** Walk length: 10 (steps per walk)
|
99 |
+
** Number of walks: 50 (walks per node)
|
100 |
+
|
101 |
+
=== Hybrid Recommendation Function ===
|
102 |
+
Combines multiple recommendation approaches:
|
103 |
+
|
104 |
+
==== Weighted Combination ====
|
105 |
+
* Content similarity: 70% weight
|
106 |
+
* Node embeddings: 30% weight
|
107 |
+
* Adaptive weighting based on availability
|
108 |
+
|
109 |
+
<syntaxhighlight lang="Python">
|
110 |
+
def get_hybrid_recommendations(query, cosine_sim, df, n_recommendations=10):
|
111 |
+
"""
|
112 |
+
Get hybrid recommendations based on content similarity and node embeddings.
|
113 |
+
|
114 |
+
Args:
|
115 |
+
query (str): Title or description to base recommendations on
|
116 |
+
cosine_sim (np.ndarray): Pre-computed cosine similarity matrix
|
117 |
+
df (pd.DataFrame): DataFrame containing Netflix content
|
118 |
+
"""
|
119 |
+
</syntaxhighlight>
|
120 |
+
|
121 |
+
== Results Visualization ==
|
122 |
+
|
123 |
+
=== Recommendation Scores Bar Chart ===
|
124 |
+
Visualization of top 10 recommendations with their similarity scores:
|
125 |
+
|
126 |
+
[[File:top_10_recommendations.png|thumb|200px|center|Top 10 Recommendations: Bar chart visualization showing similarity scores for the most relevant content recommendations based on the hybrid recommendation system.]]
|
127 |
+
|
128 |
+
=== Network Analysis ===
|
129 |
+
The recommendation network analysis reveals:
|
130 |
+
|
131 |
+
* Number of recommended items: 10
|
132 |
+
* Number of connections: 45
|
133 |
+
* Network density: 1.000
|
134 |
+
|
135 |
+
=== Recommendation Network Graph ===
|
136 |
+
Visualization of content relationships and similarities:
|
137 |
+
|
138 |
+
[[File:recommendation_network.png|thumb|200px|center|Recommendation Network: Graph visualization depicting content relationships, with nodes representing recommended items and edges showing content similarities between them.]]
|
139 |
+
|
140 |
+
The network graph shows:
|
141 |
+
* Nodes: Recommended content items
|
142 |
+
* Node size: Recommendation score
|
143 |
+
* Edges: Content similarities
|
144 |
+
* Edge thickness: Similarity strength
|
netflix_titles.csv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
recommendation_network.png
ADDED
![]() |
top_10_recommendations.png
ADDED
![]() |