Given a dataframe containing the following columns: "Name", "Age", "City", and "Salary". How do you select all rows where the age is greater than 30 and the salary is greater than 50000?
Solution
df[(df['Age']>30)&(df['Salary']>50000)]
How do you calculate the mean, median, and standard deviation of the column "Salary" in the dataframe?
Solution
mean = df['Salary'].mean()median = df['Salary'].median()standard_deviation = df['Salary'].std()
Given a dataframe, how do you sort its values by the "Age" column in descending order?
Solution
df.dropna(axis=0,how='any')
How do you group the dataframe by the "City" column and calculate the mean of the "Salary" column for each city?
Solution
Given a dataframe containing the following columns: "Name", "Age", "City", and "Salary". How do you select all rows where the name starts with 'J'?
Solution
How do you drop all duplicate rows in a dataframe?
Solution
9.How do you rename the columns in a dataframe?
Solution
How do you count the number of unique values in a column?
Solution
How do you merge two dataframes on a specific column?
Solution
How do you create a pivot table from a dataframe?
Solution
How do you select the first 5 rows of a dataframe?
Solution
How do you select the last 5 rows of a dataframe?
Solution
How do you select a specific column from a dataframe and make it into a new dataframe?
Solution
How do you concatenate two dataframes along the rows?
Solution
How do you find the maximum and minimum values of a column in a dataframe?
Solution
How do you calculate the cumulative sum of a column in a dataframe?
Solution
How do you find the unique values of a column in a dataframe?
Solution
How do you drop a specific column from a dataframe?
Solution
How do you find the number of missing values in each column of a dataframe?
Solution
How do you fill missing values in a specific column with a specific value?
Solution
How do you drop all rows with more than two missing values in the dataframe?
Solution
Let's Do some Data Cleaning and Analysis on the Practice Data.
Data Cleaning :
Although there are many steps to clean , but lets start with dropping duplicate rows.
Solution
Fill missing values :
Find a way to fill missing values in Columns with Reasonable values.
Solution
Exploratory Data Analysis (EDA) :
Generate a Statistical Summary.
Solution
Create histograms for each numeric column :
Solution
Create a bar chart to compare the distribution of salary across different cities :
Solution
Group the data by city and calculate the mean salary for each city :
Solution
Plot the mean salary for each city using a bar plot :
Solution
Plot a scatter plot to visualize the relationship between Age and Salary :
Solution
Find the highest-paid person in each city :
Solution
Find the youngest and oldest person in the data :
Solution
Group the data by age group and city and calculate the mean salary for each group and city
plt.scatter(df['Age'], df['Salary'])
plt.xlabel("Age")
plt.ylabel("Salary")
plt.title("Relationship between Age and Salary")
plt.show()
top_paid_by_city = city_group['Salary'].max().reset_index()
print("The highest-paid person in each city:")
print(top_paid_by_city)
min_age = df['Age'].min()
max_age = df['Age'].max()
print("The youngest person in the data is", df.loc[df['Age'] == min_age, 'Name'].iloc[0], "with an age of", min_age)
print("The oldest person in the data is", df.loc[df['Age'] == max_age, 'Name'].iloc[0], "with an age of", max_age)