Pandas Data frame group by one column whilst multiplying others

2353 views python
3

I am using python with pandas imported to manipulate some data from a csv file I have. Just playing around to try and learn something new.

I have the following data frame:

Image of Data Frame

I would like to group the data by col1 so that I get the following result. Which is a groupby on col1 and col3 and col4 multiplied together.

Image of result I would like

I have been watching some youtube videos and reading some similar questions on stack overflow but I am having trouble. So far I have the following which involves creating a new Col to hold the result of Col3 x Col4:

df['Col5'] = df.Col3 * df.Col4
gf = df.groupby(['col1', 'Col5'])

answered question

2 Answers

1

Almost, but you are grouping by too many columns in the end. Try:

gf = df.groupby('Col1')['Col5'].sum()

Or to get it as a dataframe, rather than Col1 as an index (I'm judging that this is what you want from your image), include as_index=False in your groupby:

gf = df.groupby('Col1', as_index=False)['Col5'].sum()

posted this
7

You can use solution without creating new column, you can multiple columns and aggregate by column df['Col1'] with aggregate sum:

gf = (df.Col3 * df.Col4).groupby(df['Col1']).sum().reset_index(name='Col2')
print (gf)
    Col1     Col2
0  12345    38.64
1  23456  2635.10
2  45678   419.88

Another solution is possible create index by Col1 by set_index, multiple columns by prod and last sum by index by level=0:

gf = df.set_index('Col1')[['Col3','Col4']].prod(axis=1).sum(level=0).reset_index(name='Col2')

posted this

Have an answer?

JD

Please login first before posting an answer.