Sales Analysis Project
Project:
In this project, we are going to analyze 12 months of data of sales. We will learn how to clean, manage and analyze the dataset to find some meaningful information.
We start by cleaning our data. Tasks during this section include:
Once we have cleaned up our data a bit, we move the data exploration section. In this section we explore 5 high level business questions related to our data:
Download Dataset:
Once you have downloaded the dataset , extract and save it somewhere.
Import Necessary Libraries
Assign Path Variable to Path of Data
Merging 12 months of sales data in a single file:
To Calculate and analyze Company Sales Data, We must merge all 12 CSV files into one.
Let us go through one-month data first , to know things better.
176558
USB-C Charging Cable
2
11.95
04/19/19 08:46
917 1st St, Dallas, TX 75001
176559
Bose SoundSport Headphones
1
99.99
04/07/19 22:30
682 Chestnut St, Boston, MA 02215
176560
Google Phone
1
600
04/12/19 14:38
669 Spruce St, Los Angeles, CA 90001
This is how our one month data looks like. Can we get all files from the directory ?
List all files from the directory
To get all files from a specific directory , we can use listdir() method.
Output :
Now we need to concat all these file in a single file, for that we need to create an empty DataFrame and concat all sales file.
Concatenate all files :
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
176558
USB-C Charging Cable
2
11.95
04/19/19 08:46
917 1st St, Dallas, TX 75001
176559
Bose SoundSport Headphones
1
99.99
04/07/19 22:30
682 Chestnut St, Boston, MA 02215
176560
Google Phone
1
600
04/12/19 14:38
669 Spruce St, Los Angeles, CA 90001
176560
Wired Headphones
1
11.99
04/12/19 14:38
669 Spruce St, Los Angeles, CA 90001
176561
Wired Headphones
1
11.99
04/30/19 09:27
333 8th St, Los Angeles, CA 90001
...
...
...
...
...
...
259353
AAA Batteries (4-pack)
3
2.99
09/17/19 20:56
840 Highland St, Los Angeles, CA 90001
259354
iPhone
1
700
09/01/19 16:00
216 Dogwood St, San Francisco, CA 94016
259355
iPhone
1
700
09/23/19 07:39
220 12th St, San Francisco, CA 94016
259356
34in Ultrawide Monitor
1
379.99
09/19/19 17:30
511 Forest St, San Francisco, CA 94016
259357
USB-C Charging Cable
1
11.95
09/30/19 00:18
250 Meadow St, San Francisco, CA 94016
186305 rows × 6 columns
Now that we have a DataFrame with all sales Files , let us convert it into a CSV File.
Convert To CSV
Let us read out Data From Updated DataFrame.
Read From Updated DataFrame
176558
USB-C Charging Cable
2
11.95
04/19/19 08:46
917 1st St, Dallas, TX 75001
176559
Bose SoundSport Headphones
1
99.99
04-07-2019 22:30
682 Chestnut St, Boston, MA 02215
176560
Google Phone
1
600
04-12-2019 14:38
669 Spruce St, Los Angeles, CA 90001
176560
Wired Headphones
1
11.99
04-12-2019 14:38
669 Spruce St, Los Angeles, CA 90001
176561
Wired Headphones
1
11.99
04/30/19 09:27
333 8th St, Los Angeles, CA 90001
Augment Data With Additional Columns
Clean up the data :
The first step in this is figuring out what we need to clean. I have found in practice, that you find things you need to clean as you perform operations and get errors. Based on the error, you decide how you should go about cleaning the data.
Display All rows of NAN:
Convert Order Date in Datetime
To convert Order Date in Datetime, first thing that needs to be done is to bring all date in a single format.
Output :
Reason Behind Error :
---> There is some unknown string named as 'Order Date' in our Dataset.
Let us see where this 'Order Date'
is by using conditional formatting.
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
517
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
1146
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
1152
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
2869
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
2884
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
...
...
...
...
...
...
...
2234519
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
2234906
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
2235918
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
2235987
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
2236093
Order ID
Product
Quantity Ordered
Price Each
Order Date
Purchase Address
4260 rows × 6 columns
Let us remove all these 4260 rows from our Dataframe.
Now let us try to convert Order Date again.
Add Month Column:
To add month column , all we need to do is to use date time methods.
Output:
What was the best month for sales? How much was earned that month?
In Our DataFrame , There are Two columns Price Each and Quantity Ordered. But we do not have Total Sale Value (TSV), We should also add a calculated column Total Sale Value. Let's do it first.
Now Price Each , Quantity Ordered is object data type , so we need to convert it.
Convert Price Each and Quantity Ordered in Float and Calculate TSV.
Now we can easily use groupby method to find maximum sales value.
Best Month For Sales : December
Total Sale Value In December : 9226886.68
Plot the Data:
December is a Festival Month , the biggest thing is December is Christmas, So people spend money and maybe that is the reason for december is best month for sales.
What city sold the most product?
There is no City Column in our Dataset, But you can see Purchase Address ,where City is mentioned in middle, we can use Purchase address and create a City Column.
Now we can groupby City and Calculate Total Quantity Ordered.
Output :
City Sold Most Product : San Francisco
Plot the Data:
What time should we display advertisements to maximize the likelihood of customer’s buying product?
Because we need to work on time, let's create Hour and Minute Column.
Let us see what time most people bought from this Company .
Plot the Data :
What Products are most often sold together ?
This is going to be a tricky business. Before diving into the problem, let us understand the problem.
What Products are most often sold together gives a very important insight that helps company lure customers into spending more and generate more profit.
You must have seen Amazon's Customer Who Bought this item also bought this Item. We are going to do something like this.
Let's try to solve this problem Step by step :
Duplicate Order ID :
Same Order ID for different products suggests customer bought different products Together in Our Data. Let us look for Duplicate OrderId Products:
Group Different Products With Same Order ID :
Let's create a new column that contains different Products with the same Order ID. We can do this with the help of transform method.
Output :
What product sold the most? Why do you think it sold the most?
This is pretty simple , we will group all product and calculate quantity ordered.
Last updated