Download the above zip file and extract it Somewhere. After extracting, you must have found two excel files and a pdf. Refer PDF for your assessment.
Import the Excel File (Solar energy.xlsx) :
Let us clean empty rows and duplicate values if any.
Find Total Volume of the Articles:
Find the total number of unique authors in the articles:
In how many articles is the word "solar" (case insensitive) mentioned?
Solution
Convert articles in lowercase format :
Convert all articles in lowercase so that "solar" keyword does not miss.
articles = solar_data['body'].str.lower()
articles
Conditional Formatting :
We can use str methods to find "solar" keyword.
len(articles.loc[articles.str.contains('solar')==True])
Output :
218
Find the total number of articles each author has written:
We can groupby author names and count their body of the article.
Add 2 columns and extract the month and year from date :
Add another column and extract Domain from the URL:
Combine Multiple Excel Worksheets In a Single DataFrame
import pandas as pd
xls = pd.ExcelFile(r'C:\Users\abhis\Downloads\Email Address.xlsx')
df1 = pd.read_excel(xls, 'Names')
df2 = pd.read_excel(xls, 'Emails')
df = pd.concat([df1, df2], axis=1)
If you see, Emails do not have spaces and are written in lower case, so we need to convert names to in same format.
df['Name'] = df['Name'].str.lower().replace(' ','')