Data Visualizations using Python and Seaborn

August 6, 2020

Data Visualizations using Python and Seaborn

Data Visualization

Data visualization refers to the process of representation of data in various visual formats like a graph, chart, etc. It is important because it allows trends and hidden patterns to be more easily seen, which is also easier for the human brain to understand.

Python provides various libraries for data visualization libraries such as matplotlib, seaborn, plotly, bokeh, etc. In this article, we learn about data visualization by using seaborn.

Seaborn

Seaborn is a Python data visualization library built on top of the matplotlib library. It provides a high-level interface for drawing attractive and informative statistical graphics. The best thing about seaborn is we get a lot of statistical tools that help us understand more about the data.

First, we need to divide our dataset in between independent variables and dependent variables. To find the relation or pattern between any two variables. Seaborn comes with many different types of plot. Now we will discuss different types of plot provided by seaborn.

Installing Seaborn

The seaborn has few requirements which you need to have

1. NumPy

2. pandas

3. matplotlib

4 SciPy

To install seaborn you should have all the above-mentioned libraries installed in your system .Once you install the required libraries, we are ready to install seaborn. To install seaborn, you can use

pip install seaborn

After we install seaborn, we are all set to explore seaborn.

To begin with seaborn libraries we need to import it as

import seaborn as sns

After importing we will be using built-in function load_dataset() which is used to load the dataset which is present inside the seaborn libraries.

In this article, we will be using the ‘tips’ dataset. This is a dataset from a restaurant where many come to eat food and after eating based on a total bill they hate paid some tips. For these, we have features like total_bill, tip, sex, smoker, day, time, size you can load the dataset by using

import seaborn as sns
df=sns.load_dataset('tips')
df.head()

output :

The aim of this dataset is that we should be able to create a model where we need to predict the tip based on the features available

Here you can see the tip is dependent feature and total_bill, sex, day, time, size are the independent features.

Correlation Matrices

We will start by plotting correlation matrices. It can be done by using a feature called heatmap. Heatmap helps us to find a correlation between every feature. Correlation helps us to find out the interrelation between all the features.

But the basic requirement for finding correlation is that the feature should be numerical feature i.e data type must be int or float.

Correlation matrices cannot be found for categorical features because they are object type. Whenever you will find correlation matrices the value will be ranging from -1 to +1 which is Pearson correlation. So to find the correlation you can use

 df.corr()

Output :

As you can see we are getting only 3 features because only these here are numerical and the rest are categorical.

To visualize it for getting better understanding you can use

 sns.heatmap(df.corr())

Output :

Scatterplot

Scatter plots are used to plot data points on the horizontal and vertical axis. It shows how much one variable is affected by another. It shows the extent of correlation. It is also used to find the relationship between two variables.

To plot a scatter plot we use relplot() function of seaborn library. It can be done by using

sns.relplot(x='total_bill', y='tip', data=df)

Output :

Hue Plot

If you want to classify your plot based on any feature. So you can use any feature and classify scatter plot. For this, there is a parameter called as hue. You can use hue plot as

sns.relplot(x= 'total_bill', y='tip', hue='sex', data=df)

Output :



As you can see the scatter plot is classified based on sex by giving color to each point.

Pair Plot

If there are two or more independent features then we use the pair plot. It is also called a scatter plot, in which one variable in the same data row is matched with the value of another variable. In pair plot of, there are more than two independent features then it will combine or do permutation and combination of all the features. This plot can only be a plot on numerical data. It can plot by using

sns.pairplot(df)

Output :

Dist Plot

Dist Plot helps to create histograms. We will use a function called as distplot() which plots histogram. It creates a frequency distribution of continuous variables. It can be created by using

sns.distplot(df['total_bill'])

Output :

Join Plot

A join plot helps to learn about the relationship between 2 numeric variables. It is used to do univariate analysis. It displays a correlation between two variables. You can plot a join plot as

sns.jointplot(x='total_bill', y='tip', data=df, kind='hex')

Output :

you can also change the kind parameter here and get different visualizations as

sns.jointplot(x='total_bill', y='tip', data=df, kind='kde')

Output :

Plotting Categorical Data

So, far we have visualized data which is specifically numerical i.e. int or float datatype. Now we will try to analyze categorical data type. In our dataset, there are 4 categorical features that are day, sex, time, smoker.

Count Plot

It shows the counts of observations in each categorical bin using bars. It can be thought of as a histogram across a categorical variable.

sns.countplot('sex', data=df)

Output :

As you can see it plots the number of bars as there are categories in the features.

Bar Plot

Bar plot does the same work as count plot. But in this, we have to specify both x and y. Based on one feature it will display other value

sns.barplot(x='smoker' , y='total_bill', data=df)

Output :

Data Visualizations using Python and Seaborn

Box Plot

The box plot shows the quartile values of the distribution. Each value in the box plot corresponds to actual observation in the data. It is often used in explanatory data analysis. It also shows outliers. You can plot boxplot as

sns.boxplot(x='smoker', y='total_bill', data=df)

Output :

Violin Plot

The violin plot helps to see both the distribution of data in terms of kernel density estimation and the box plot. It displays how the distribution is with respect to data. It is a combination of both boxplot and kernel density estimation. You can violin plot as

sns.violinplot(x='day', y='total_bill', data=df)

Output :

So far we have discussed various data visualization techniques using seaborn. This various data visualization will surely help in your respective data science projects. Hope you will understand data visualization using seaborn in python.

Data Visualizations using Python and Seaborn

Data Visualization

Seaborn

Installing Seaborn

Correlation Matrices

Scatterplot

Hue Plot

Leave a comment Cancel reply

Top Tutorials

Recent Posts

Cloud Data Analytics: Driving Smarter Business Decisions

Cloud Data Analytics Driving Smarter Business Decisions

Cloud Data Engineering and Analytics: Powering the Future of Data-Driven Decisions

Cloud Data Engineering and Analytics Shaping the Future of Data-Driven Innovation

The Rise of Generative AI in Modern Technology

Work with us

Contact Us

Data Visualizations using Python and Seaborn

Data Visualization

Seaborn

Installing Seaborn

Correlation Matrices

Scatterplot

Hue Plot

Related Posts

Leave a comment Cancel reply