1.Read in the comma-separated file "client_list.csv". Assign as variable df1
.
df1 = pd.read_csv("client_list.csv")
- Read in the delimted file "client_list.table". Assign as variable as
df2
.
df2 = pd.read_csv("client_list.table", sep=';')
- Read in the fixed-width file "client_list.txt". Assign as variable
df3
df3 = pd.read_fwf("client_list.txt")
- Read in the comma-separated file "client_list.csv", skip the first 3 rows, and ignore the header. Do not assign to variable (just return a view).
pd.read_csv("client_list.csv", skiprows=3, header=None)
- Read in the comma-separated file "client_list.csv". Set the column headers in all caps. Assign as variable
df
.
df = pd.read_csv("client_list.csv")
df.columns = [x.upper() for x in df.columns]
- Read in the comma-separated file "client_list_practice.csv" and only extract the columns
["FIRST_NAME","AGE","EYE_COLOR"]
. Do not assign to a variable.
pd.read_csv("client_list_practice.csv", usecols=["FIRST_NAME","AGE","EYE_COLOR"])
- Slice rows 5 through 11 of
df
. Can you provide two ways of doing this?
df[4:11]
df.loc[4:10, :]
- Return only the columns ['LAST_NAME','AGE','HAIR_COLOR'] for
df
. Can you provide two ways of doing this?
df[['LAST_NAME','AGE','HAIR_COLOR']]
df.loc[:, ['LAST_NAME','AGE','HAIR_COLOR']]
- Combine problems 7 and 8: return rows 5 though 11 and columns ['LAST_NAME','AGE','HAIR_COLOR'] for
df
. Can you provide two ways of doing this?
df[4:11][['LAST_NAME','AGE','HAIR_COLOR']]
df.loc[4:10, ['LAST_NAME','AGE','HAIR_COLOR']]
- Find the subset of
df
where the client's last name is "Smith".
df[df.LAST_NAME=='Smith']
df.loc[df.LAST_NAME=='Smith', :]
- Find the subset of
df
where the client's hair color is not black.
df.loc[df.HAIR_COLOR!='black',:]
df.loc[~(df.HAIR_COLOR=='black'), :]
df[df.HAIR_COLOR!='black']
df[~(df.HAIR_COLOR=='black')]
- Find the subset of
df
where the client's hair color is red and reset the values to "ginger".
df.loc[df.HAIR_COLOR=='red', 'HAIR_COLOR'] = "ginger"
- Find the subset of
df
where the clients are females older than 30 years.
df[(df.AGE>30) & (df.GENDER=='F')]
df.loc[(df.AGE>30) & (df.GENDER=='F'), :]
- Repeat problem 13, but return only the hair color and eye color.
df[(df.AGE>30) & (df.GENDER=='F')][['HAIR_COLOR','EYE_COLOR']]
df.loc[(df.AGE>30) & (df.GENDER=='F'), ['HAIR_COLOR','EYE_COLOR']]
- Find the unique combination of hair and eye color for women older than 25 years.
df.loc[(df.GENDER=='F') & (df.AGE>25), ['HAIR_COLOR','EYE_COLOR']].drop_duplicates()
df[(df.GENDER=='F') & (df.AGE>25)][['HAIR_COLOR','EYE_COLOR']].drop_duplicates()
- Perform a
merge
using "client_list.csv" and "customer_id_list.csv". Assign the resulting dataframe asclients
.
df = pd.read_csv("client_list.csv")
df.columns = [x.upper() for x in df.columns]
ids = pd.read_csv("customer_id_list.csv")
clients = pd.merge(left=df, right=ids, how='left', on=['LAST_NAME','FIRST_NAME','GENDER','AGE'])
- Perform a
merge
usingclients
and "purchase_log.csv" and limit the subset to only clients who made purchases. Assign the resulting dataframe asdetailed_sales
.
sales = pd.read_csv("purchase_log.csv")
detailed_sales = pd.merge(left=clients, right=sales, how='inner', on=['CUSTOMER_ID'])
- Use
groupby
to find the client who spent the most money on purchases. Determine how much he/she spent. HINT: save the intermediate dataframe from usinggroupby
asspenders
before applying slicing to determine the client who spent the most money on purchases.
spenders = detailed_sales.groupby(['CUSTOMER_ID','FIRST_NAME','LAST_NAME','GENDER','AGE'], as_index=False)['PRICE'].sum()
spenders[spenders.PRICE==spenders.PRICE.max()]
- (BONUS) Modify the answer to problem 18 slightly to determine exactly what items where purchased by the top spending client.
top_spender_id = spenders[spenders.PRICE==spenders.PRICE.max()].reset_index(drop=True).loc[0,'CUSTOMER_ID']
sales.loc[sales.CUSTOMER_ID==top_spender_id, 'ITEM_DESCRIPTION']
- Save
detailed_sales
as a csv file named "df_out.csv" with no indices.
detailed_sales.to_csv("df_out.csv", index=False)
- Save
detailed_sales
to a pickle file named "df_out.p"
detailed_sales.to_pickle("df_out.p")