Pandas Practice Problems
Download DataSet :
Given a dataframe containing the following columns: "Name", "Age", "City", and "Salary". How do you select all rows where the age is greater than 30 and the salary is greater than 50000?
How do you calculate the mean, median, and standard deviation of the column "Salary" in the dataframe?
Given a dataframe, how do you sort its values by the "Age" column in descending order?
How do you group the dataframe by the "City" column and calculate the mean of the "Salary" column for each city?
Given a dataframe containing the following columns: "Name", "Age", "City", and "Salary". How do you select all rows where the name starts with 'J'?
How do you drop all duplicate rows in a dataframe?
9.How do you rename the columns in a dataframe?
How do you count the number of unique values in a column?
How do you merge two dataframes on a specific column?
How do you create a pivot table from a dataframe?
How do you select the first 5 rows of a dataframe?
How do you select the last 5 rows of a dataframe?
How do you select a specific column from a dataframe and make it into a new dataframe?
How do you concatenate two dataframes along the rows?
How do you find the maximum and minimum values of a column in a dataframe?
How do you calculate the cumulative sum of a column in a dataframe?
How do you find the unique values of a column in a dataframe?
How do you drop a specific column from a dataframe?
How do you find the number of missing values in each column of a dataframe?
How do you fill missing values in a specific column with a specific value?
How do you drop all rows with more than two missing values in the dataframe?
Let's Do some Data Cleaning and Analysis on the Practice Data.
Data Cleaning :
Although there are many steps to clean , but lets start with dropping duplicate rows.
Fill missing values :
Find a way to fill missing values in Columns with Reasonable values.
Exploratory Data Analysis (EDA) :
Generate a Statistical Summary.
Create histograms for each numeric column :
Create a bar chart to compare the distribution of salary across different cities :
Group the data by city and calculate the mean salary for each city :
Plot the mean salary for each city using a bar plot :
Plot a scatter plot to visualize the relationship between Age and Salary :
Find the highest-paid person in each city :
Find the youngest and oldest person in the data :
Group the data by age group and city and calculate the mean salary for each group and city
age_group = df.groupby(['Age Group', 'City']) mean_salary_by_age_group_and_city = age_group['Salary'].mean().reset_index()
Pivot the table to display the mean salary for each age group and city
pivot_table = pd.pivot_table(mean_salary_by_age_group_and_city, values='Salary', index='Age Group', columns='City')
print("Mean Salary by Age Group and City:")
print(pivot_table)
Plot the mean salary for each age group and city using a heatmap
sns.heatmap(pivot_table, annot=True, cmap='Blues')
plt.xlabel("City")
plt.ylabel("Age Group")
plt.title("Mean Salary by Age Group and City")
plt.show()
Calculate the top 5 most common names in the data
top_names = df['Name'].value_counts().head()
print("The top 5 most common names in the data:")
print(top_names)
Create a new column to indicate whether a person's age is below or above the average age
mean_age = df['Age'].mean()
df['Age Above Average'] = np.where(df['Age']>=mean_age, 1, 0)
Group the data by age above average and city and calculate the median salary for each group and city
age_above_average_group = df.groupby(['Age Above Average', 'City'])
median_salary_by_age_above_avg_and_city=age_above_average_group['Salary'].median().reset_index()
Pivot the table to display the median salary for each age above average group and city
pivot_table2 = pd.pivot_table(median_salary_by_age_above_average_and_city, values='Salary', index='Age Above Average', columns='City')
print("Median Salary by Age Above Average and City:")
print(pivot_table2)
Plot the median salary for each age above average group and city using a stacked bar chart
pivot_table2.plot(kind='bar', stacked=True)
plt.xlabel("Age Above Average")
plt.ylabel("Median Salary")
plt.title("Median Salary by Age Above Average and City")
plt.show()
Last updated