Given a dataframe containing the following columns: "Name", "Age", "City", and "Salary". How do you select all rows where the age is greater than 30 and the salary is greater than 50000?
Solution
df[(df['Age'] > 30) & (df['Salary'] > 50000)]
How do you calculate the mean, median, and standard deviation of the column "Salary" in the dataframe?
Solution
mean = df['Salary'].mean()
median = df['Salary'].median()
standard_deviation = df['Salary'].std()
Given a dataframe, how do you sort its values by the "Age" column in descending order?
Solution
df.dropna(axis=0, how='any')
How do you group the dataframe by the "City" column and calculate the mean of the "Salary" column for each city?
Solution
df.groupby('City')['Salary'].mean()
Given a dataframe containing the following columns: "Name", "Age", "City", and "Salary". How do you select all rows where the name starts with 'J'?
Solution
df[df['Name'].str.startswith('J')]
How do you drop all duplicate rows in a dataframe?
Plot a scatter plot to visualize the relationship between Age and Salary :
Solution
plt.scatter(df['Age'], df['Salary'])
plt.xlabel("Age")
plt.ylabel("Salary")
plt.title("Relationship between Age and Salary")
plt.show()
Find the highest-paid person in each city :
Solution
top_paid_by_city = city_group['Salary'].max().reset_index()
print("The highest-paid person in each city:")
print(top_paid_by_city)
Find the youngest and oldest person in the data :
Solution
min_age = df['Age'].min()
max_age = df['Age'].max()
print("The youngest person in the data is", df.loc[df['Age'] == min_age, 'Name'].iloc[0], "with an age of", min_age)
print("The oldest person in the data is", df.loc[df['Age'] == max_age, 'Name'].iloc[0], "with an age of", max_age)
Group the data by age group and city and calculate the mean salary for each group and city