/  Technology   /  Build a Recommendation System using Python
Build a Recommendation System using Python

Build a Recommendation System using Python

Introduction

Nowadays every customer face multiple choice may it be during purchasing any product from an e-commerce website, while watching videos on YouTube or movies on Netflix, etc. Earlier if you want to watch any movie online you might waste a lot of time browsing around on the internet or look for recommendation from other people. And if you choose to watch a movie which is highly rated by other peoples, then it is not mandatory that you will also like it. You might not like it. So, it always better to take a choice based on our previous experience be it in watching any movie, reading a book or purchasing any product. Such kind of technique is known as Recommendation. In this article we will learn to build Recommendation System which we will be the light introduction to topic. But before that lets understand what exactly is Recommendation System.

Recommendation System

A computer program that helps user discover products or content by predicting the users rating of each item and then showing them the items, they would rate highly. Recommendation system are now used everywhere. For many online platforms their recommendation engines are the actual business. It has various applications in

  1. Product search results used in Google search.
  2. Product similarity used while buying products on online platforms
  3. User similarity used for recommendation on Facebook and Instagram.

Recommendation system are of two types

  • Content-Based Filtering

It tries to recommend product that have similar attribute to the product that user already likes. For example, we take Netflix, as you when we are watching any movie on Netflix a lot of recommendation comes on that movie types. It totally depends upon content whether you are watching a Drama movie, Horror movie or a romantic movie based on the content types it gives various recommendations.

  • Collaborative Filtering

Making recommendation only based on how user rated product in the past, not based on anything about the product. The system has no knowledge about the product it only knows how other rated the product. It uses past rating to recommend the product. For example, we take Flipkart website if you buy any smartphone then it gives recommendations to buy earphones, phone cover and all the other things that user usually purchases with smartphone.

About 35 percent of Amazons revenue and 75 percent of Netflix’s revenue is generated from recommendation.

So now let’s move forward to build our recommendation system.  We will learn to build a Movie Recommender System using MovieLens dataset. This is a dataset of movies. In this we have information like user Id, Rating, Movie Id, Timespan. User Id is the data of unique users who are giving rating to the movie of specific Movie Id and Timespan is the length of the movie. Based on Item Id there is another dataframe where we have information of movie names corresponding to Movie Id. By using this dataset we have to recommend a movie to the user based on the user rating that we have in the dataset. You can download this dataset from the below link

https://grouplens.org/datasets/movielens/

 

First we need to import all the relevant libraries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

 

After this we have to read the dataset which is contain rating

df = pd.read_csv('D://Data Science//Datasets//MovieLens//ratings.csv')
df.head()

Build a Recommendation System using Python

 

Next we will read the dataset having the names of the movies with the corresponding Movie Id.

movies=pd.read_csv('D://DataScience//Datasets//MovieLens//movies.csv')
movies.head()

Build a Recommendation System using Python

 

Now will merge this two dataframes considering key as movieId column. For this we have merge function in pandas where we need to provide the names of the dataframes and key on which we have to merge the dataframes.

 

df = pd.merge(df,movies,on='movieId')
df.head()

Now we will do exploratory data analysis to find some basic information from this dataset.

We will create rating dataframe which will contain the average rating and total number of rating for the given title.

 

ratings = pd.DataFrame(df.groupby('title')['rating'].mean())
ratings['num of ratings'] = pd.DataFrame(df.groupby('title')['rating'].count())
ratings.head()

Build a Recommendation System using Python

 

As you can see the dataframe is created with title, rating and Number of rating.

Next we will plot histogram with respect to rating and Number of rating.

sns.distplot(ratings['rating'])

 

Build a Recommendation System using Python
ratings['num of ratings'].hist(bins=70)

 

Build a Recommendation System using Python

For this we have used distplot function present in seaborn and hist function present in matplotlib. Now we will create a join plot which will the combination of both the above two.

sns.jointplot(x='rating' , y='num of ratings' , data=ratings)

 

Build a Recommendation System using Python

sns.relplot(x='rating' , y='num of ratings' , data=ratings)

 

Build a Recommendation System using Python

Here we are done with simple analysis of the data that is EDA. Next, we will see how we will be recommending movies.

Then we will create a matrix that has the user ids on ‘Y’ access and the movie title on ‘X’ axis. Each cell will then consist of the rating the user gave to that movie. There will be a lot of missing or NaN values, because people will only rate the movie they have watched and most people have not seen most of the movies. For this we will create pivot table.

 

moviepvt  =  df.pivot_table(index='userId' , columns='title' , values='rating')
moviepvt.head()

 

As we have created a rating dataframe so we will sort the values on the number of rating in descending order.

ratings.sort_values('num of ratings' , ascending=False).head(10)

Build a Recommendation System using Python

You can see we have got the top 10 movies with most number of ratings.

Then consider some movies like Jurassic Park or Forrest Gump and based on these movies we will try to find correlation in the pivot table we have created. From this we will try to recommend movie. So now we will get information about the Jurassic Park from the pivot table.

Jurassic_ratings = moviepvt['Jurassic Park (1993)']
Jurassic_ratings.tail()

 

Build a Recommendation System using Python

Similarly we get information for Forrest Gump also.

 

Forrest _ratings = moviepvt['Forrest Gump (1994)']
Forrest_ratings.tail()

 

Now with respect to this data we will correlate it with the pivot table data and find correlation with userId. And for that we will use the function corrwith, to find the similar rating as that of Jurassic Park and Forrest Gump.

 

Jurassic_correlate = moviepvt.corrwith(Jurassic_ratings)
Forrest_correlate = moviepvt.corrwith(Forrest_ratings);

After performing this we have got movies in similar to Jurassic Park and Forrest Gump. Let’s create a new dataframe and put all the similar movies in it and clean the dataframe by removing all the NaN values.

 

corr_jurassic = pd.DataFrame(Jurassic_correlate,columns=['Similarity'])
corr_jurassic.dropna(inplace=True)
corr_jurassic.sort_values('Similarity',ascending=False).head(10)

 

Build a Recommendation System using Python

As you can these are the movie which are mostly similar with Jurassic Park and will be the recommended movies.

Similarly for Forrest Gump

corr_Forrest = pd.DataFrame(Forrest_correlate,columns=['Similarity'])
corr_Forrest.dropna(inplace=True)
corr_Forrest.sort_values('Similarity',ascending=False).head(10)

Build a Recommendation System using Python

In this article we learnt to build a simple recommendation system using correlation. We first started by understanding the fundamentals of recommendations and then went on to load the MovieLens dataset for the purpose of demonstration. Hope you will find it useful.

Leave a comment