Seaborn

Seaborn is a Python module exclusively for data visualization which was built on top of matplotlib. It can be used for drawing informative and attractive statistical graphics.

Importing libraries and dataset.

Let’s start by importing Pandas, which is a great library for managing relational (i.e. table-format) datasets:

1#Pandas for managing datasets

2 import pandas as pd

Next, we’ll import Matplotlib, which will help us customize our plots further.

Tip:

In Jupyter Notebook, you can also include %matplotlib inline to display your plots inside your notebook.

1# Matplotlib for additional customization

2 from matplotlib import pyplot as plt

3 %matplotlib inline

Then, we’ll import the Seaborn library, which is the star of today’s show.

1# Seaborn for plotting and styling

2 import seaborn as sns

Now we’re ready to import our dataset.

Tip:

we gave each of our imported libraries an alias. Later, we can invoke Pandas with pd, Matplotlib with plt, and Seaborn with sns.

Today, we’ll be using a cool Pokémon dataset . Here’s the dataset:

https://www.kaggle.com/abcsds/pokemon

Dataset for this tutorial.

Once you’ve downloaded the CSV file, you can import it with Pandas.

Tip:

The argument  index_col=0 simply means we’ll treat the first column of the dataset as the ID column.

Import dataset

1# Read dataset

2 df = pd.read_csv(‘Pokemon.csv’, index_col=0)

Here’s what the dataset looks like:

1 # Display first 5 observations

2 df.head()

Seaborn 1 (i2tutorials)

As you can see, we have combat stats data for the original 151 (a.k.a best 151) Pokémon.

 

Step 3: Seaborn’s plotting functions.

One of Seaborn’s greatest strengths is its diversity of plotting functions. For instance, making a scatter plot is just one line of code using the lmplot() function.

There are two ways you can do so.

1. The first way (recommended) is to pass your DataFrame to the data=argument, while passing column names to the axes arguments, x= and y=.

2. The second way is to directly pass in Series of data to the axes arguments.

For example:

let’s compare the Attack and Defense stats for our Pokémon:

 

1 # Recommended way

2 sns.lmplot(x=’Attack’, y=’Defense’, data=df)

3

4 # Alternative way

5 # sns.lmplot(x=df.Attack, y=df.Defense)

 

Seaborn 2 (i2tutorials)

 

By the way, Seaborn doesn’t have a dedicated scatter plot function, which is why you see a diagonal line. We actually used Seaborn’s function for fitting and plotting a regression line.

Thankfully, each plotting function has several useful options that you can set. Here’s how we can tweak the lmplot():

1. First, we’ll set fit_reg=Falseto remove the regression line, since we only want a scatter plot.

2. Then, we’ll set hue=’Stage’to color our points by the Pokémon’s evolution stage. This hueargument is very useful because it allows you to express a third dimension of information using color.

 

1# Scatterplot arguments

2 sns.lmplot(x=’Attack’, y=’Defense’, data=df,

3   fit_reg=False, # No regression line

4   hue=’Stage’)   # Color by evolution stage

Seaborn 3 (i2tutorials)

 

Looking better, but we can improve this scatter plot further. For example, all of our Pokémon have positive Attack and Defense values, yet our axes limits fall below zero. Let’s see how we can fix that…

 

Step 4: Customizing with Matplotlib

Remember, Seaborn is a high-level interface to Matplotlib. From our experience, Seaborn will get you most of the way there, but you’ll sometimes need to bring in Matplotlib.

Setting your axes limits is one of those times, but the process is pretty simple:

1. First, invoke your Seaborn plotting function as normal.

2. Then, invoke Matplotlib’s customization functions. In this case, we’ll use its ylim()andxlim()

Here’s our new scatter plot with sensible axes limits:

 

1# Plot using Seaborn

2 sns.lmplot(x=’Attack’, y=’Defense’, data=df,

3  fit_reg=False,

4  hue=’Stage’)

5

6# Tweak using Matplotlib

7 plt.ylim(0, None)

8 plt.xlim(0, None)

Seaborn 4 (i2tutorials)

For more information on Matplotlib’s customization functions, check out its documentation.

 

Step 5: The role of Pandas.

Even though this is a Seaborn tutorial, Pandas actually plays a very important role. You see, Seaborn’s plotting functions benefit from a base DataFrame that’s reasonably formatted.

 

For example:

let’s say we wanted to make a box plot for our Pokémon’s combat stats:

 

1# Boxplot

2 sns.boxplot(data=df)

Seaborn 5 (i2tutorials)

Well, that’s a reasonable start, but there are some columns we’d probably like to remove:

1. We can remove the Total since we have individual stats.

2. We can remove the Stage and Legendary columns because they aren’t combat stats.

In turns out that this isn’t easy to do within Seaborn alone. Instead, it’s much simpler to pre-format your DataFrame.

Let’s create a new DataFrame called stats_df that only keeps the stats columns:

Pre-format DataFrame

 

1 # Pre-format DataFrame

2 stats_df = df.drop([‘Total’, ‘Stage’, ‘Legendary’], axis=1)

3

4 # New boxplot using stats_df

5 sns.boxplot(data=stats_df)

Seaborn 6 (i2tutorials)

 

It’s outside the scope of this tutorial to dive into Pandas, but here’s a handy cheat sheet.

 

Step 6: Seaborn themes.

Another advantage of Seaborn is that it comes with decent style themes right out of the box. The default theme is called ‘darkgrid’.

Next, we’ll change the theme to ‘whitegrid’ while making a violin plot.

1. Violin plots are useful alternatives to box plots.

2. They show the distribution (through the thickness of the violin) instead of only the summary statistics.

For example:

we can visualize the distribution of Attack by Pokémon’s primary type:

Set theme, then plot violin plot

1# Set theme

2 sns.set_style(‘whitegrid’)

3

4# Violin plot

5 sns.violinplot(x=’Type 1′, y=’Attack’, data=df)

Seaborn 7 (i2tutorials)

As you can see, Dragon types tend to have higher Attack stats than Ghost types, but they also have greater variance.

Now, Pokémon fans might find something quite jarring about that plot: The colors are nonsensical. Why is the Grass type colored pink or the Water type colored orange? We must fix this!

 

Step 7: Color palettes.

Fortunately, Seaborn allows us to set custom color palettes. We can simply create an ordered Python list of color hex values.

Let’s use Bulbapedia to help us create a new color palette:

Pokemon color palette

 

1 pkmn_type_colors = [‘#78C850’,  # Grass

2                   ‘#F08030’,  # Fire

3                  ‘#6890F0’,  # Water

4                   ‘#A8B820’,  # Bug

5                   ‘#A8A878’,  # Normal

6                  ‘#A040A0’,  # Poison

7                   ‘#F8D030’,  # Electric

8                    ‘#E0C068’,  # Ground

9                   ‘#EE99AC’,  # Fairy

10                  ‘#C03028’,  # Fighting

11                   ‘#F85888’,  # Psychic

12                   ‘#B8A038’,  # Rock

13                   ‘#705898’,  # Ghost

14                    ‘#98D8D8’,  # Ice

15                  ‘#7038F8′,  # Dragon

16                 ]

 

Wonderful. Now we can simply use the palette= argument to recolor our chart.

Custom color palette

 

1# Violin plot with Pokemon color palette

2 sns.violinplot(x=’Type 1′, y=’Attack’, data=df,

3              palette=pkmn_type_colors) # Set color palette

Seaborn 8 (i2tutorials)

 

Much better!

Violin plots are great for visualizing distributions. However, since we only have 151 Pokémon in our dataset, we may want to simply display each point.

That’s where the swarm plot comes in. This visualization will show each point, while “stacking” those with similar values:

Swarm plot

1 # Swarm plot with Pokemon color palette

2 sns.swarmplot(x=’Type 1′, y=’Attack’, data=df,

3              palette=pkmn_type_colors)

Seaborn 9 (i2tutorials)

 

That’s handy, but can’t we combine our swarm plot and the violin plot? After all, they display similar information, right?

 

Step 8: Overlaying plots.

The answer is yes.

It’s pretty straightforward to overlay plots using Seaborn, and it works the same way as with Matplotlib. Here’s what we’ll do:

1. First, we’ll make our figure larger using Matplotlib.

2. Then, we’ll plot the violin plot. However, we’ll set inner=Noneto remove the bars inside the violins.

3. Next, we’ll plot the swarm plot. This time, we’ll make the points black so they pop out more.

4. Finally, we’ll set a title using Matplotlib.

Overlaying swarm and violin plots

 

1 # Set figure size with matplotlib

2 plt.figure(figsize=(10,6))

3

4 # Create plot

5 sns.violinplot(x=’Type 1′,

6              y=’Attack’,

7              data=df,

8               inner=None, # Remove the bars inside the violins

9               palette=pkmn_type_colors)

10

11 sns.swarmplot(x=’Type 1′,

12              y=’Attack’,

13              data=df,

14             color=’k’, # Make points black

15             alpha=0.7) # and slightly transparent

16

17 # Set title with matplotlib

18 plt.title(‘Attack by Type’)

Seaborn 10 (i2tutorials)

 

Awesome, now we have a pretty chart that tells us how Attack values are distributed across different Pokémon types. But what it we want to see all of the other stats as well?

 

Step 9: Putting it all together.

Well, we could certainly repeat that chart for each stat. But we can also combine the information into one chart… we just have to do some data wrangling with Pandas beforehand.

First, here’s a reminder of our data format:

First 5 rows of stats_df

1 stats_df.head()

Seaborn 11 (i2tutorials)

 

As you can see, all of our stats are in separate columns. Instead, we want to “melt” them into one column.

To do so, we’ll use Pandas’s melt() function. It takes 3 arguments:

1. First, the DataFrame to melt.

2. Second, ID variables to keep (Pandas will melt all of the other ones).

3. Finally, a name for the new, melted variable.

Here’s the output:

Melt DataFrame

1 # Melt DataFrame

2 melted_df = pd.melt(stats_df,

3                  id_vars=[“Name”, “Type 1”, “Type 2″], # Variables to keep

4                   var_name=”Stat”) # Name of melted variable

5 melted_df.head()

Seaborn 12 (i2tutorials)

All 6 of the stat columns have been “melted” into one, and the new Stat column indicates the original stat (HP, Attack, Defense, Sp. Attack, Sp. Defense, or Speed). For example, it’s hard to see here, but Bulbasaur now has 6 rows of data.

In fact, if you print the shape of these two DataFrames…

Shape comparison

1 print( stats_df.shape )

2 print( melted_df.shape )

3# (151, 9)

4# (906, 5)

…you’ll find that melted_df has 6 times the number of rows as stats_df.

Now we can make a swarm plot with melted_df.

– But this time, we’re going to set x=’Stat’and y=’value’ so our swarms are separated by stat.

– Then, we’ll set hue=’Type 1’to color our points by the Pokémon type.

Swarmplot with melted_df

Python

1# Swarmplot with melted_df

2 sns.swarmplot(x=’Stat’, y=’value’, data=melted_df,

3              hue=’Type 1′)

Seaborn 13 (i2tutorials)

Finally, let’s make a few final tweaks for a more readable chart:

1. Enlarge the plot.

2. Separate points by hue using the argument split=True.

3. Use our custom Pokemon color palette.

4. Adjust the y-axis limits to end at 0.

5. Place the legend to the right.

 

Customizations:

1 # 1. Enlarge the plot

2 plt.figure(figsize=(10,6))

3

4 sns.swarmplot(x=’Stat’,

5               y=’value’,

6              data=melted_df,

7              hue=’Type 1′,

8             split=True, # 2. Separate points by hue

9              palette=pkmn_type_colors) # 3. Use Pokemon palette

10

11 # 4. Adjust the y-axis

12 plt.ylim(0, 260)

13

14 # 5. Place legend to the right

15 plt.legend(bbox_to_anchor=(1, 1), loc=2)

Seaborn 14 (i2tutorials)

There we go!

 

Step 10: Pokédex (mini-gallery).

We’re going to conclude this tutorial with a few quick-fire data visualizations, just to give you a sense of what’s possible with Seaborn.

10.1 – Heatmap

Heatmaps help you visualize matrix-like data.

Heatmap

1 # Calculate correlations

2 corr = stats_df.corr()

3

4 # Heatmap

5 sns.heatmap(corr)

Seaborn 15 (i2tutorials)

10.2 – Histogram

Histograms allow you to plot the distributions of numeric variables.

Histogram

1# Distribution Plot (a.k.a. Histogram)

2 sns.distplot(df.Attack)

Seaborn 16 (i2tutorials)

10.3 – Bar Plot

Bar plots help you visualize the distributions of categorical variables.

Bar Plot

1# Count Plot (a.k.a. Bar Plot)

2 sns.countplot(x=’Type 1′, data=df, palette=pkmn_type_colors)

3

4 # Rotate x-labels

5 plt.xticks(rotation=-45)

Seaborn 17 (i2tutorials)

10.4 – Factor Plot

Factor plots make it easy to separate plots by categorical classes.

Factor Plot

1 # Factor Plot

2 g = sns.factorplot(x=’Type 1′,

3                  y=’Attack’,

4                  data=df,

5                   hue=’Stage’,  # Color by stage

6                  col=’Stage’,  # Separate by stage

7                   kind=’swarm’) # Swarmplot

8

9# Rotate x-axis labels

10 g.set_xticklabels(rotation=-45)

11

12 # Doesn’t work because only rotates last plot

13 # plt.xticks(rotation=-45)

Seaborn 18 (i2tutorials)

 

10.5 – Density Plot

Density plots display the distribution between two variables.

Tip: 

Consider overlaying this with a scatter plot.

Density Plot

1# Density Plot

2 sns.kdeplot(df.Attack, df.Defense)

Seaborn 19 (i2tutorials)

 

10.6 – Joint Distribution Plot

Joint distribution plots combine information from scatter plots and histograms to give you detailed information for bi-variate distributions.

Joint Distribution Plot

1# Joint Distribution Plot

2 sns.jointplot(x=’Attack’, y=’Defense’, data=df)

Seaborn 20 (i2tutorials)

 

Congratulations… you’ve made it to the end of this Python Seaborn.