# Getting percentage for each column after groupby

2901 views
5

I have a pandas dataframe with two columns `A` and `B`. The column `B` contains three categories `X`, `Y`, 'Z'. I need to check the how much percentage is a particular value for each group in A. Here is how the dataframe looks like:

``````  A   B
AA  X
BB  Y
CC  Z
AA  Y
AA  Y
BB  Z
..  ..
``````

Now I want to plot a stacked plot but it should be a percentage based stacked plot and not just count based for each category in `B` corresponding to a group in `A`. Here is what I did so far:

`df.groupby(['A'])['B'].value_counts().unstack()` which gives me this

``````B   X    Y      Z
A
AA  65   666    5
BB  123  475    6
CC  267  1337   40
``````

Now I want to divide each column by the sum of it's corresponding row like for first row `(65/(65+666+5), 666/(65+666+5), 5/(65+666+5),)`and plot the results as stacked bar plot. Can someone please help?

A similar question was asked yesterday. Just add `normalize=True` as an argument to `value_counts`

Yeah, foolish me. It was that easy. Thanks Alloz

10

You can find the row-wise sum and divide along the axis something like this:

``````freq_df = df.groupby(['A'])['B'].value_counts().unstack()
pct_df = freq_df.divide(freq_df.sum(axis=1), axis=0)
``````

And then to plot that you should simply be able to use

``````pct_df.plot(kind="bar", stacked=True)
``````

posted this