I am using python with pandas imported to manipulate some data from a csv file I have. Just playing around to try and learn something new.
I have the following data frame:
I would like to group the data by col1 so that I get the following result. Which is a
groupby on col1 and col3 and col4 multiplied together.
I have been watching some youtube videos and reading some similar questions on stack overflow but I am having trouble. So far I have the following which involves creating a new Col to hold the result of Col3 x Col4:
df['Col5'] = df.Col3 * df.Col4 gf = df.groupby(['col1', 'Col5'])
Almost, but you are grouping by too many columns in the end. Try:
gf = df.groupby('Col1')['Col5'].sum()
Or to get it as a dataframe, rather than
Col1 as an index (I'm judging that this is what you want from your image), include
as_index=False in your groupby:
gf = df.groupby('Col1', as_index=False)['Col5'].sum()
You can use solution without creating new column, you can multiple columns and aggregate by column
df['Col1'] with aggregate
gf = (df.Col3 * df.Col4).groupby(df['Col1']).sum().reset_index(name='Col2') print (gf) Col1 Col2 0 12345 38.64 1 23456 2635.10 2 45678 419.88
gf = df.set_index('Col1')[['Col3','Col4']].prod(axis=1).sum(level=0).reset_index(name='Col2')