In this blog, we will learn about how to implement content-based recommender system using Python programming example. We will learn with the example of movie recommender system for recommending movies. Download the movies data from here to work with example given in this blog.
The following is a list of key activities we would do to build a movie recommender system based on content-based recommendation technique.
To start with, we import the data in csv format. Once data is imported, next step is analyse and prepare data before we apply modeling techniques.
import pandas as pd
df = pd.read_csv('sample_data/movies.csv')
df = df[['title', 'genres', 'keywords', 'cast', 'director']]
df = df.fillna('') # Fill missing values with empty strings
df.head()
The dataset contains 24 columns, only a few of which are needed to describe a movie. In the above code, key features such as title, genres, etc are extracted and missing values in these features are filled with empty string.
The next step is add a column in the dataframe that holds the value created by combining other columns representing key attributes such as title, genres, etc. And, then CountVectorizer is used on that column to vectorize the text. The following code does the same.
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english', min_df=20)
word_matrix = vectorizer.fit_transform(df['features'])
word_matrix.shape
The following happens in the above code:
The table of word counts contains 4,803 rows—one for each movie—and 918 columns. The next task is to compute cosine similarities for each row pair:
from sklearn.metrics.pairwise import cosine_similarity
sim = cosine_similarity(word_matrix)
The final step is to get one or more recommendations based on the input – a movie title. The Python code given below can be used to get the top N recommendations for movies similar to input movie name.
def recommend(title, df, sim, count=10):
# Get the row index of the specified title in the DataFrame
index = df.index[df['title'].str.lower() == title.lower()]
# Return an empty list if there is no entry for the specified title
if (len(index) == 0):
return []
# Get the corresponding row in the similarity matrix
similarities = list(enumerate(sim[index[0]]))
# Sort the similarity scores in that row in descending order
recommendations = sorted(similarities, key=lambda x: x[1], reverse=True)
# Get the top n recommendations, ignoring the first entry in the list since
# it corresponds to the title itself (and thus has a similarity of 1.0)
top_recs = recommendations[1:count + 1]
# Generate a list of titles from the indexes in top_recs
titles = []
for i in range(len(top_recs)):
title = df.iloc[top_recs[i][0]]['title']
titles.append(title)
return titles
The following can be learned from the above code:
You can now invoke the recommend function to get the top 10 recommendations (count = 10). This is how you would invoke the function: recommend( ‘Spectre’, df, sim).
You can use the Python code example given in this blog to create a content-based recommender system. In case you have questions, feelf ree to drop a message.
In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…
Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…
With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…
Anxiety is a common mental health condition that affects millions of people around the world.…
In machine learning, confounder features or variables can significantly affect the accuracy and validity of…
Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…