Interview Assessment

Download the above zip file and extract it Somewhere. After extracting, you must have found two excel files and a pdf. Refer PDF for your assessment.

Task1 :

Import the Excel File (Solar energy.xlsx) :

Solution
solar_data = pd.read_excel(fr'{path}\Solar_Energy.xlsx')
solar_data 

Clean the Data :

Let us clean empty rows and duplicate values if any.

Solution
solar_data.dropna(inplace=True)   #drop NA Values
solar_data.drop_duplicates(inplace=True)  #drop Duplicates
solar_data.reset_index(drop=True,inplace=True)  #Reset Index
solar_data

Find Total Volume of the Articles:

We can use shape attribute to find total volume of the articles.

Solution
solar_data.shape[0]

Output : 

270

Find the total number of unique authors in the articles:

Solution
len(solar_data['author_name'].unique())

Output : 

60

In how many articles is the word "solar" (case insensitive) mentioned?

Solution

Convert articles in lowercase format :

Convert all articles in lowercase so that "solar" keyword does not miss.

articles = solar_data['body'].str.lower()
articles

Conditional Formatting :

We can use str methods to find "solar" keyword.

len(articles.loc[articles.str.contains('solar')==True])

Output : 
218

Find the total number of articles each author has written:

We can groupby author names and count their body of the article.

Solution
solar_data.groupby('author_name')['body'].count().sort_values(ascending=False)

Output  :  

author_name
Jules Scully                                              44
Sean RaiRoche                                             39
Liam Stoker                                               38
Andy Colthorpe                                            18
Charlie Duffield                                          14
Kelsey Misbrener                                          11
John Engel                                                 7
Heather Clancy                                             7
Uma Gupta                                                  6
........

Add 2 columns and extract the month and year from date :

Solution
solar_data['month'] = solar_data['date'].dt.month_name() 
solar_data['year'] = solar_data['date'].dt.year

Add another column and extract Domain from the URL:

Solution
solar_data['domain'] = solar_data['url'].str.split('.',n=1,expand=True)[0]

Task 2:

Hint : Approach to Solve the Problem

Combine Multiple Excel Worksheets In a Single DataFrame

import pandas as pd

xls = pd.ExcelFile(r'C:\Users\abhis\Downloads\Email Address.xlsx')
df1 = pd.read_excel(xls, 'Names')
df2 = pd.read_excel(xls, 'Emails')
df = pd.concat([df1, df2], axis=1)

Last updated