Data Visualization Using Python and Plotly
Data Visualization
- Data Visualization allows us to quickly interpret the data and adjust different variables to see their effect
- Technology is increasingly making it easier for us to do so.
Why to Visualize Data?
- To observe the patterns
- Identify the extreme values that could be anomalies
- Easy Interpretation
Popular plotting libraries in Python
Python offers multiple graphing libraries that offers diverse features.
Matplotlib | To create 2D graphs and plots |
Pandas visualization | Easy to use interface, built on matplotlib |
Seaborn | Provides a high-level interface for drawing attractive and informative statistical graphics |
Ggplot | based on R’s ggplot2, uses Grammar of Graphics |
Plotly | Can create interactive plots |
Here we are going to talk about only one library which is plotly.
Plotly
Plotly provides online graphing, analytics, and statistics tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST. It is used to create interactive plots.
The dataset we are going to use for this topic is a cars dataset which you can download from Kaggle.
The subtopics we are going to cover here are as follows:
- Bar charts
- Scatter plots
- Histograms
- Box and whiskers plot
- Line plots
- Pie charts
- Donut charts
- Heat Map
- Contour plots
- Quiver plots
So, Let’s get started
1) Bar Plot
What is a bar plot?
A bar plot is a plot that presents categorical data with rectangular bars with lengths proportional to the counts that they represent
When to use bar plot?
To represent the frequency distribution of categorical variables.
A bar diagram makes it easy to compare sets of data between different groups
Let’s see an example of how to plot a bar diagram
First import the necessary libraries
import pandas as pd import numpy as np import plotly.offline as pyo import plotly. graph_objs as go
Then the read the csv file into Jupyter Notebook. We are going to plot the bar chart between the price column and num_of_cylinder column.
Syntax
fig = go. Figure ([go. Bar (x = data.num_of_cylinders, y = data. price)]) fig. show ()
Output:
When you take the cursor at any point on the graph you will able to see the values at that point.
2) Scatter plot
What is a scatter plot?
A scatter plot is a set of points that represents the values obtained for two different variables plotted on a horizontal and vertical axes
When to use scatter plots?
Scatter plots are used to convey the relationship between two numerical variables
Scatter plots are sometimes called correlation plots because they show how two variables are correlated
Let’s see an example
First import the necessary libraries
import pandas as pd import numpy as np import plotly.offline as pyo import plotly. graph_objs as go
Then read the csv file in Jupyter Notebook. We are going to plot the
Scatter plot between city_mpg and highway_mpg column.
Syntax
fig = go.Figure([go.Scatter(x = data.city_mpg , y = data.highway_mpg, mode = 'markers', marker = dict(size = 10 , color = 'red' , symbol = 'square'))]) fig.show()
On X axis we have city_mpg variable and on Y axis we have highway_mpg variable. Mode we have provide markers so that the graph we see is not a continuous line you can also modify the markers inside the dict () method. You can access some of its properties like size, color, symbol etc.
Output:
To make a line graph you just have to change the mode parameter from markers to lines.
3) Histogram
What is a histogram?
It is a graphical representation of data using bars of different heights
Histogram groups numbers into ranges and the height of each bar depicts the frequency of each range or bin
When to use histograms?
To represent the frequency distribution of numerical variables.
Let’s see an example
First import the necessary libraries
import pandas as pd import numpy as np import plotly.offline as pyo import plotly. graph_objs as go
Then read the csv file in Jupyter Notebook. We are going to plot the
Histogram between city_mpg and highway_mpg column.
Syntax
trace0 = go.Histogram(x = cars_data.city_mpg , name = 'City_mpg' , opacity = 0.5) trace1 = go.Histogram(x = cars_data.highway_mpg , name = 'Highway_mpg', opacity = 0.5) data = [trace0,trace1] layout = go.Layout(title = 'city vs highway mpg') fig = go.Figure(data = data , layout = layout) pyo.plot( fig )
Output:
4) Box and Whiskers plot
A box and whisker plot is a diagram that shows the statistical distribution of a set of data. This makes it easy to see how data is distributed along a number line.
Let’s see an example
First import the necessary libraries
import pandas as pd import numpy as np import plotly. offline as pyo import plotly. graph_objs as go
Syntax
trace2 = go. Box (y = cars_data. price, name = 'Price') trace3 = go. Box (y = cars data. city_mpg, name = 'City_mpg') data = [trace2, trace3] layout = go. Layout (title = 'Box and Whisker plot') fig = go. Figure (data = data, layout = layout) pyo. plot(fig)
Output:
5) Pie chart
Pie charts are generally used to show percentage or proportional data and usually the percentage represented by each category is provided next to the corresponding slice of pie. Pie charts are good for displaying data for around 6 categories or fewer.
Let’s see an example
First import the necessary libraries
import pandas as pd import numpy as np import plotly. offline as pyo import plotly. graph_objs as go
Syntax
labels=['price','city_mpg','highway_mpg','horsepower'] values = [1200,345,666,1000] fig = go.Figure(data=[go.Pie(labels = labels, values=values)]) fig.show()
Output:
6) Donut Chart
Pie chart and Donut chart both are same. To create a Donut chart just add one more parameter that is hole.
Let’s see an example
Syntax
labels=['price','city_mpg','highway_mpg','horsepower'] values = [1200,345,666,1000] fig = go.Figure(data=[go.Pie(labels = labels, values=values, hole = .3)]) fig.show()
Output:
7) Heat map
Heat Maps are graphical representations of data that utilize color-coded systems. The primary purpose of Heat Maps is to better visualize the volume of locations within a dataset and assist in directing viewers towards areas on data visualizations that matter most.
Let’s see an example
Syntax
fig = go. Figure (data=go. Heatmap ( z= [[10, 2, 30, 5, 1], [20, 10, 6, 80, 30], [35, 60, 1, -10, 45]], x= ['Jan', 'Feb', 'March', 'April', 'May'], y= ['Summer', 'Winter', 'Monsoon'])) fig. show ()
Output:
8) Contour plot
A contour plot is a graphical technique for representing a 3-dimensional surface by plotting constant z slices, called contours, on a 2-dimensional format. That is, given a value for z, lines are drawn for connecting the (x, y) coordinates where that z value occurs.
Let’s see an example
Syntax
fig = go. Figure (data = go.Contour (z= [[10, 10.625, 12.5, 15.65, 30], [5.625, 6.25, 8.125, 10.25, 15.625], [2.5, 3.125, 5., 18.125, 1.5], [3.625, 1.25, 3.125, 0.25, 10.625], [1, 2.625, 2.545, 7.625, 10]], colorscale='Electric', )) fig. show ()
Output:
9) Quiver plot
A quiver plot displays velocity vectors as arrows with components (u, v) at the points (x, y). quiver (x, y, u, v) plots vectors as arrows at the coordinates specified in each corresponding pair of elements in x and y.
Syntax
import plotly. figure_factory as ff x = [12,23,45,62.66] y = [91.3,45,66,9] u = [12,3,45,6.678] v = [12,33,44,54] fig = ff. create_quiver (x, y, u, v) fig. show ()
Output:
In this way we have seen how to create interactive graphs using plotly.