Python: how to shorten a code to generate a pandas dataframe?

1086 views python
5

I want to create a pandas dataframe df1 with specific column name from a column col of another dataframe df and do a merge with another dataframe df2.

df
    Name   House
0   John   London
1   John   London
2   John   London
3   Tom    New York
4   Tom    New York

df2
     Col  Val
0    Tom    3
1    John   2
2    Alex   5
3    Sarah  2

This what I am doing

import pandas as pd
x = pd.unique(df['Name'])
x = pd.DataFrame(x)
x.columns = ['col']
df1 = pd.merge(x, df2, on = 'Col')

df1
    Col  Val 
0   Tom    3
1   John   2

answered question

You can pass column names as a named argument when creating the dataframe. This is in the docs. That's one line down, but really, I think this is kinda superfluous.

df.drop_duplicates('col')

I think you want df2[df2.Col.isin(df.Name.unique())]

1 Answer

4

Wouldn't this work?

import pandas as pd
x = pd.unique(df['col'])
x = pd.DataFrame(x, columns=['col'])
df1 = df.merge(x, on='col')

posted this

Have an answer?

JD

Please login first before posting an answer.