Python Pandas: Generate a new column that calculates the subtotal of all the cells above that row in a specific column

1990 views python
6

Sorry for the seemingly confusing title. The problem shall be really simple but I'm stumped and need some help here.

The data frame that I have now:

New_ID  STATE   MEAN
0   1   Lagos   7166.101571
1   2   Rivers  2464.065846
2   3   Oyo     1974.699365
3   4   Akwa    1839.126698
4   5   Kano    1757.642462

I want to create a new column that in row i, it will calculate df[:i,'MEAN'].sum()/df['MEAN'].sum()

For example, for data frame:

    ID  MEAN
0   1.0 5
1   2.0 10
2   3.0 15
3   4.0 30
4   5.0 40

My desired output:

     ID MEAN SUBTOTAL
0   1.0 5   0.05
1   2.0 10  0.10
2   3.0 15  0.30
3   4.0 30  0.60
4   5.0 40  1.00

I tried

df1['SUbTotal'] = df1.loc[:df1['New_ID'], 'MEAN']/df1['MEAN'].sum()

but it says:

Name: New_ID, dtype: int32' is an invalid key 

Thanks for your time in advance

answered question

Don't say "It doesn't work." Say "It gives the wrong result and here's why" or "It raises an exception and here is the error message."

Allow me to edit. Thanks.

1 Answer

13

This should do it, it seems like you're looking for cumsum:

df['SUBTOTAL'] = df.MEAN.cumsum() / df.MEAN.sum()

>>> df
    ID  MEAN  SUBTOTAL
0  1.0     5      0.05
1  2.0    10      0.15
2  3.0    15      0.30
3  4.0    30      0.60
4  5.0    40      1.00

posted this

Have an answer?

JD

Please login first before posting an answer.