Pandas Practice Problems

Download DataSet :

file-download
17KB
  1. Given a dataframe containing the following columns: "Name", "Age", "City", and "Salary". How do you select all rows where the age is greater than 30 and the salary is greater than 50000?

chevron-rightSolutionhashtag
df[(df['Age'] > 30) & (df['Salary'] > 50000)]
  1. How do you calculate the mean, median, and standard deviation of the column "Salary" in the dataframe?

chevron-rightSolutionhashtag
mean = df['Salary'].mean()
median = df['Salary'].median()
standard_deviation = df['Salary'].std()
  1. Given a dataframe, how do you sort its values by the "Age" column in descending order?

chevron-rightSolutionhashtag
df.dropna(axis=0, how='any')
  1. How do you group the dataframe by the "City" column and calculate the mean of the "Salary" column for each city?

chevron-rightSolutionhashtag
  1. Given a dataframe containing the following columns: "Name", "Age", "City", and "Salary". How do you select all rows where the name starts with 'J'?

chevron-rightSolutionhashtag
  1. How do you drop all duplicate rows in a dataframe?

chevron-rightSolutionhashtag

9.How do you rename the columns in a dataframe?

chevron-rightSolutionhashtag
  1. How do you count the number of unique values in a column?

chevron-rightSolutionhashtag
  1. How do you merge two dataframes on a specific column?

chevron-rightSolutionhashtag
  1. How do you create a pivot table from a dataframe?

chevron-rightSolutionhashtag
  1. How do you select the first 5 rows of a dataframe?

chevron-rightSolutionhashtag
  1. How do you select the last 5 rows of a dataframe?

chevron-rightSolutionhashtag
  1. How do you select a specific column from a dataframe and make it into a new dataframe?

chevron-rightSolutionhashtag
  1. How do you concatenate two dataframes along the rows?

chevron-rightSolutionhashtag
  1. How do you find the maximum and minimum values of a column in a dataframe?

chevron-rightSolutionhashtag
  1. How do you calculate the cumulative sum of a column in a dataframe?

chevron-rightSolutionhashtag
  1. How do you find the unique values of a column in a dataframe?

chevron-rightSolutionhashtag
  1. How do you drop a specific column from a dataframe?

chevron-rightSolutionhashtag
  1. How do you find the number of missing values in each column of a dataframe?

chevron-rightSolutionhashtag
  1. How do you fill missing values in a specific column with a specific value?

chevron-rightSolutionhashtag
  1. How do you drop all rows with more than two missing values in the dataframe?

chevron-rightSolutionhashtag

Let's Do some Data Cleaning and Analysis on the Practice Data.

Data Cleaning :

Although there are many steps to clean , but lets start with dropping duplicate rows.

chevron-rightSolutionhashtag

Fill missing values :

Find a way to fill missing values in Columns with Reasonable values.

chevron-rightSolutionhashtag

Exploratory Data Analysis (EDA) :

Generate a Statistical Summary.

chevron-rightSolutionhashtag

Create histograms for each numeric column :

chevron-rightSolutionhashtag

Create a bar chart to compare the distribution of salary across different cities :

chevron-rightSolutionhashtag

Group the data by city and calculate the mean salary for each city :

chevron-rightSolutionhashtag

Plot the mean salary for each city using a bar plot :

chevron-rightSolutionhashtag

Plot a scatter plot to visualize the relationship between Age and Salary :

chevron-rightSolutionhashtag

Find the highest-paid person in each city :

chevron-rightSolutionhashtag

Find the youngest and oldest person in the data :

chevron-rightSolutionhashtag

Group the data by age group and city and calculate the mean salary for each group and city

age_group = df.groupby(['Age Group', 'City']) mean_salary_by_age_group_and_city = age_group['Salary'].mean().reset_index()

Pivot the table to display the mean salary for each age group and city

pivot_table = pd.pivot_table(mean_salary_by_age_group_and_city, values='Salary', index='Age Group', columns='City')

print("Mean Salary by Age Group and City:")

print(pivot_table)

Plot the mean salary for each age group and city using a heatmap

sns.heatmap(pivot_table, annot=True, cmap='Blues')

plt.xlabel("City")

plt.ylabel("Age Group")

plt.title("Mean Salary by Age Group and City")

plt.show()

Calculate the top 5 most common names in the data

top_names = df['Name'].value_counts().head()

print("The top 5 most common names in the data:")

print(top_names)

Create a new column to indicate whether a person's age is below or above the average age

mean_age = df['Age'].mean()

df['Age Above Average'] = np.where(df['Age']>=mean_age, 1, 0)

Group the data by age above average and city and calculate the median salary for each group and city

age_above_average_group = df.groupby(['Age Above Average', 'City'])

median_salary_by_age_above_avg_and_city=age_above_average_group['Salary'].median().reset_index()

Pivot the table to display the median salary for each age above average group and city

pivot_table2 = pd.pivot_table(median_salary_by_age_above_average_and_city, values='Salary', index='Age Above Average', columns='City')

print("Median Salary by Age Above Average and City:")

print(pivot_table2)

Plot the median salary for each age above average group and city using a stacked bar chart

pivot_table2.plot(kind='bar', stacked=True)

plt.xlabel("Age Above Average")

plt.ylabel("Median Salary")

plt.title("Median Salary by Age Above Average and City")

plt.show()

Last updated