efficiently flattening a large multiidex in pandas

1637 views python
5

I have a very large DataFrame that looks like this:

                  A B 
SPH2008 3/21/2008 1 2
        3/21/2008 1 2 
        3/21/2008 1 2
SPM2008 6/21/2008 1 2 
        6/21/2008 1 2
        6/21/2008 1 2

And I have the following code which is intended to flatten and acquire the unique pairs of the two indeces into a new DF:

indeces = [df.index.get_level_values(0), df.index.get_level_values(1)]
tmp = pd.DataFrame(data=indeces).T.drop_duplicates()
tmp.columns = ['ID', 'ExpirationDate']
tmp.sort_values('ExpirationDate', inplace=True)

However, this operation takes a remarkably long amount of time. Is there a more efficient way to do this?

answered question

2 Answers

1

pandas.DataFrame.index.drop_duplicates

pd.DataFrame([*df.index.drop_duplicates()], columns=['ID', 'ExpirationDate'])

        ID ExpirationDate
0  SPH2008      3/21/2008
1  SPM2008      6/21/2008

posted this
2

IIUC, You can also groupby the levels of your multiindex, then create a dataframe from that with your desired columns:

>>> pd.DataFrame(df.groupby(level=[0,1]).groups.keys(), columns=['ID', 'ExpirationDate'])
        ID ExpirationDate
0  SPH2008      3/21/2008
1  SPM2008      6/21/2008

posted this

Have an answer?

JD

Please login first before posting an answer.