Import Seaborn as SNS
On this page, we are going to learn about Seaborn.
Simply put, Seaborn is a library with various functions to plot information using statistical graphics. Here statistical functions take arguments depending upon if they are univariate, bivariate or non-variate.
- Univariate – Univariate plots mean the statistical functions just take one column as input and using that it plots the information
- Bivariate – Bivariate, as the name says, takes “bi” or two columns as input to plot information.
- Non-variate – Seaborn also provides statistical functions which don’t take any column name as an argument like sns.pairplot(), sns.PairGrid(), etc. We’ll discuss more these plotting functions with other methods as we go further in detail.
“Note: In all functions, it is mandatory to provide a dataset name.
Syntax: sns.plotFunc(data=dataset)”
Let’s also add types of values in your vocab section so that they help you when you’ll be working with statistical plots.
There are 2 types of values:
Categorical values:
Categorical values are those values where a column has a fixed number of options.
For example Week (7 days), Sex(Male, female, other), Months(12 months), Smoker (Yes or No), etc.
Mainly these values show there is some limit in the column and these variables don’t have an infinite set of options.
Numerical values:
Numerical values are those values where a column doesn’t have a fixed number of options. Meaning, the value varies with time.
For example Weight, Height, Bank balance, etc.
We call these values numerical ones because they have unpredictable value.
Example: We don’t know how much weight u may be carrying. Today u are 150 lbs, tomorrow it could be 149 or 152 lbs. It is a variable value, can have any possible value and it doesn’t have fix no. of options like a week (which has 7).
Now, in Seaborn, we have 5 types of plots which we are going to discuss today:
- Distribution plots
- Categorical plots
- Matrix plots
- Grids
- Regression plots
Let’s have a brief intro about every plot mentioned above:
Distribution Plots: Distribution plots are mainly based upon Numerical values.
Types:
- distplot()
- jointplot()
- pairplot()
- rugplot()
- kdeplot()
Categorical Plots: These plots are mainly used when we need to analyze the data using categorical values and numerical values. This means we will use one axis of graph (plot) with categorical value and one for numerical value.
Types:
- barplot()
- boxplot()
- countplot()
- factorplot()
- stripplot()
- swarmplot()
- violinplot()
Matric Plots: These plotting graphs shows the relation between every single column, having meaningful values, with itself and other columns.
In matrix plot, you use dataset_name.pivot_table() function to assemble a data in matrix form using arguments, “values”, “index” and “columns”.
Syntax:
dataset_name.pivot_table(values= 'numerical value column on which data is based', index= ‘column that you want to show on left side’, columns= ‘column that you want to show at bottom’)
Types:
- heatmap()
- clustermap()
Grids: Grids are general types of plots that allow you to map plot types to rows and columns of a grid, this helps you create similar plots separated by features.
Types:
- PairGrid()
- pairplot()
- FacetGrid()
- JointGrid()
Regression plots: Regression plots are those plots which use regression algorithm to show regression line in a plot. They show a unanimous relation between the columns you selected.
Types:
- lmplot()
Comments: