Seaborn

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

Install Seaborn

In [1]:

Get Data Repository:

When you install seaborn , it comes with datasets for you to practice on. You can go through Data Repository.

Import All Libraries

In [2]:

Get Dataset names:

In [3]:

Out[3]:

Load Dataset

In [4]:

Out[4]:

total_bill
tip
sex
smoker
day
time
size

0

16.99

1.01

Female

No

Sun

Dinner

2

1

10.34

1.66

Male

No

Sun

Dinner

3

2

21.01

3.50

Male

No

Sun

Dinner

3

3

23.68

3.31

Male

No

Sun

Dinner

2

4

24.59

3.61

Female

No

Sun

Dinner

4

How to create your first graph

Load Dataset

We use load_dataset method to load datasets from data repository.

In [5]:

Out[5]:

total_bill
tip
sex
smoker
day
time
size

0

16.99

1.01

Female

No

Sun

Dinner

2

1

10.34

1.66

Male

No

Sun

Dinner

3

2

21.01

3.50

Male

No

Sun

Dinner

3

3

23.68

3.31

Male

No

Sun

Dinner

2

4

24.59

3.61

Female

No

Sun

Dinner

4

...

...

...

...

...

...

...

...

239

29.03

5.92

Male

No

Sat

Dinner

3

240

27.18

2.00

Female

Yes

Sat

Dinner

2

241

22.67

2.00

Male

Yes

Sat

Dinner

2

242

17.82

1.75

Male

No

Sat

Dinner

2

243

18.78

3.00

Female

No

Thur

Dinner

2

244 rows × 7 columns

Creating graph with the help of seaborn is as easy as matplotlib.Suppose we want to create a scatter plot of relationship between Totalbill and tips. All you need to do is to use scatterplot method.

Note :We use scatterplot() method to create scatter plot instead of scatter like we used to do in matplotlib.

In [8]:

hue:

hue : (optional) This parameter take column name for colour encoding.

Suppose we also want to know smokers along with total_bill and tip. We can show this third information with the help of hue.

In [7]:

You can also change order of hue by using hue_order property.

In [9]:

Palette:

What if you want to change color encoding for smokers and non-smokers.

For this you can use palette property.

In [11]:

smoker is a categorical value , but you can also use any numerical value in hue. Let's say i want to color encode sizes.

In [12]:

relplot:

relplot is used for many purposes , but the main purpose behind using replot is sub-plotting. Suppose we want to suplot our graph based on smoker and non-smoker.

We can use row,col to achieve this result.

Let's see this with an example.

In [14]:

What if i used row property.

In [15]:

Graph would have appeared like this.

We can do same with size by choosing col as size.

In [17]:

Now as you can see it has one column and all the graphs are in a single column. We can use col_wrap to restrict number of graphs in a single column.

In [21]:

Applying Row and Column property in a single plot

what if i want to see data of smokers in column and data of time in rows, we can use col and row property in a single plot.

In [22]:

using categorical column

what if one of my axis is categorical column ?

In [23]:

size

relplot offers a property size that changes the size of values. Let's see this with an example.

In [25]:

A Useful Example

Seaborn offers another dataset named flights. Let's see this data first.

In [13]:

Out[13]:

year
month
passengers

0

1949

Jan

112

1

1949

Feb

118

2

1949

Mar

132

3

1949

Apr

129

4

1949

May

121

...

...

...

...

139

1960

Aug

606

140

1960

Sep

508

141

1960

Oct

461

142

1960

Nov

390

143

1960

Dec

432

144 rows × 3 columns

If we want to know how many passengers were on board on a particular month and year.

In [7]:

Categorical column and relplot

Whenever there is a categorical column and a numerical column , we donot use relplot. Let me show you why?

In [15]:

This obviously is not a clear graph. So whenever we deal with categorical column we use catplot(categorical plot).

Catplot

Categorical plot is always a best option if one value is Categorical and another is numerical.

In [17]:

A Useful example

Suppose we want to know total bill dayswise , we can use catplot as day is a categorical plot while total bill is numerical.

In [23]:

You can change graph to bar by using kind = 'bar' property.

In [25]:

You can change graph to bar by using kind = 'point' property.

In [27]:

You can change graph to bar by using kind = 'violin' property.

In [29]:

From this violin chart it is very easy to figure out pattern of bills that were given on any day. Like on thursday most of the total bill were between 10 to 20 dollars.

You can change graph to bar by using kind = 'boxen' property.

In [31]:

Boxen chart also works like violin chart. The only difference is Boxen chart assigns darkest colour to the area where most of the data lies.

As you can see a horizontal line , this horizontal line is median of the data.

dots that you see in boxen chart are outliers

You can change graph to bar by using kind = 'box' property.

In [32]:

Countplot

seaborn.countplot() method is used to Show the counts of observations in each categorical bin.

A Useful Example

Suppose we want to count Type 1 of pokemon in our pokemon dataset. We can use countplot here.

In [22]:

Introduction to pairplot:

Plot pairwise relationships in a dataset.

By default, this function will create a grid of Axes such that each numeric variable in data will by shared across the y-axes across a single row and the x-axes across a single column.

In [35]:

You can also add hue property to see gender of customers.

In [36]:

A Useful Example :

In [38]:

Out[38]:

sepal_length
sepal_width
petal_length
petal_width
species

0

5.1

3.5

1.4

0.2

setosa

1

4.9

3.0

1.4

0.2

setosa

2

4.7

3.2

1.3

0.2

setosa

3

4.6

3.1

1.5

0.2

setosa

4

5.0

3.6

1.4

0.2

setosa

...

...

...

...

...

...

145

6.7

3.0

5.2

2.3

virginica

146

6.3

2.5

5.0

1.9

virginica

147

6.5

3.0

5.2

2.0

virginica

148

6.2

3.4

5.4

2.3

virginica

149

5.9

3.0

5.1

1.8

virginica

150 rows × 5 columns

Check Range of Sepal length for species:

Now as we have discussed before in the note, whenever we are dealing with ranges , Histogram is always a better option. So let us create a histogram for species and their sepal length.

In [40]:

Last updated