/  Technology   /  Data Visualizations using Python and MatplotLib
Data Visualizations using Python and MatploLib

Data Visualizations using Python and MatplotLib

Data Visualization

Data visualization refers to the process of representation of data in various visual formats like a graph, chart, etc. It is important because it allows trends and hidden patterns to be more easily seen, which is also easier for the human brain to understand.

Python provides various libraries for data visualization libraries such as matplotlib, seaborn, plotly, bokeh, etc. In this article, we learn about data visualization by using matplotlib.

MatplotLib

MatplotLib is a python data visualization library. It is most frequently used data visualization library in python. If you want to plot some data points in 2D or 3D you can use matplotlib. It is numerical mathematics extension of NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython etc.

As matplotlib is pre-installed in most of the IDEs we don’t need to install it again. We can directly use it. For that we need to import it as

 

import matplotlib.pyplot as plt
%matplotlib inline

The second is also important for specially working  in Jupyter notebook. It basically means that whenever you want to display any graph using matplotlib you always have to use the alias name (plt.show()), but after using this line you don’t need to write the alias name every time to display the graph or diagram.

 

Scatterplot

Scatter plots are used to plot data points on the horizontal and vertical axis. It shows how much one variable is affected by another. It shows the extent of correlation. It is also used to find the relationship between two variables.

To plot a scatter plot we use scatter() function of matplotlib library, it is used to scatter the values of the given variables. To plot the scatter plot we require two variables which we will create by using np.arrange() function. This function basically takes two values as input which are start and stop values and creates a array. The basic scatter plot can be created as

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x = np.arange(1,50)
y = np.arange(151,200)
plt.scatter(x,y)

Output :

Data Visualizations using Python and MatploLib

This is very basic plot that you can plot. You can modify the plot by giving it title, xlabel, ylabel, color to the points in order to make it more informative and attractive. You can add this to the plot by using the parameter like

 

import matplotlib.pyplot as plt
%matplotlib inline
x = [5.5,7.3,5,7,4.4,8,6.4,9]
y = [88,91,94,96,86,98,85,99]
x1 = [2,1.5,3,1.2,3.5,2.7,4,1]
y1 = [45,34,55,37,64,40,77,39]
plt.scatter(x,y, label='More Study',color='r')
plt.scatter(x1,y1,label='Less Study',color='b')
plt.xlabel('NO. of Hours of Study')
plt.ylabel('Marks Obtained')
plt.title('Scatter Plot')
plt.legend()

Output :

Data Visualizations using Python and MatploLib

As you see now the plot looks more informative. You can also save this plot as image by using savefig()

plt.savefig(‘scatter.png’)

It will save this plot as image in your working directory in png format.

 

Line Plot

The Line plot is a curve which displays information as a series of data points. It is basic type of plot common in many fields. You can plot it as

import matplotlib.pyplot as plt
%matplotlib inline
x = [2.5,3,4.6,5,6.8,7.7]
y = [56,60,67,74,82,91]
plt.plot(x,y)
plt.xlabel('NO. of Hours of Study')
plt.ylabel('Marks Obtained')
plt.title('Line Plot')

Output :

Data Visualizations using Python and MatploLib

You can modify this plot by changing the line width, color, by using parameter like linewidth, color, linestyle

plt.plot(x,y,linewidth=5,color='magenta',linestyle='-.')

Output :

Data Visualizations using Python and MatploLib

 

Bar Plot

A Bar plot uses bars to compare data among different categories. It counts the observation in each categorical bins using bars. You can plot bar plot as

Import matplotlib.pyplot as plt
%matplotlib inline
x = [2.5,3,4.6,5,6.8,7.7]
y = [56,60,67,74,82,91]
plt.bar(x,y,color=‘cyan’)
plt.title('Bar Plot')

Output :

Data Visualizations using Python and MatploLib

 

Box Plot

The box plot shows the quartile values of the distribution. Each value in the box plot corresponds to actual observation in the data. It is often used in explanatory data analysis. It also shows outliers. You can plot boxplot as

Value = [np.random.normal(0 , data , 50) for data in range(1,4)]
plt.boxplot(Value,patch_artist=True);

Output :

Data Visualizations using Python and MatploLib

 

Histogram

Histogram are used to distribution. It creates a frequency distribution of continuous variables. It can be created by using

marks=[22,55,62,45,21,22,34,42,64,2,95,85,55,70,65,55,80,75,65,54,69,63,42,48,19,8,33,29]
bins = [0,10,20,30,40,50,60,70,80,90,100]
plt.hist(marks,bins,rwidth=0.7)
plt.xlabel('Marks Range')
plt.ylabel('Number of Students')
plt.title('Histogram')

Output :

Data Visualizations using Python and MatploLib

 

Pie Chart

A pie Chart is a circular chart that shows how the total amount of data is divided between the each categorical variable.  It is used to show percentage or proportional data represented by each variable. It can be plotted as

labels = 'Physics', 'Chemistry', 'Maths', 'Biology', 'English'
marks = [65, 77, 72, 84, 69]
colors = ['c','m','r','b','g']
plt.pie(marks, labels=labels, colors=colors, explode=(0.1,0,0,0,0), autopct='%1.1f%%');

Output :

Data Visualizations using Python and MatploLib

 

Subplots

Subplots means creating multiple plots within one single plot. The function we use to create sub plot is subplot(), which require three basic parameters that are numbers of rows, number of columns, position. In the example we will taking two rows and two columns.

import matplotlib.pyplot as plt
%matplotlib inline
x = [2.5,3,4.6,5,6.8,7.7]
y = [56,60,67,74,82,91]
plt.subplot(2,2,1)
plt.plot(x,y,'r')
plt.subplot(2,2,2)
plt.plot(x,y,'g')
plt.subplot(2,2,3)
plt.plot(x,y,'b')
plt.subplot(2,2,4)
plt.plot(x,y,'y')

Output :

Data Visualizations using Python and MatploLib

So far we have discussed various data visualization techniques using matplotlib. This various data visualization techniques will surely help in your respective data science projects. Hope you will understand data visualization using matplotlib in python.

Leave a comment