Selecting one of two dataframe columns as input into a new column

2348 views python
1

I have a data-frame (df) which the head looks like:

             DATE   BBG  FLAG  DATE_DIFF   NEW_DATE    
0      2014-03-04  1333     1      -31.0 2014-04-03 
1      2014-04-04  1333     0      -91.0 2014-04-04 
2      2014-07-04  1333     1      -31.0 2014-04-07 
3      2014-08-04  1333     1      -31.0 2014-04-08 
4      2014-09-04  1333     1      -30.0 2014-04-09 
5      2014-10-04  1333     1      -31.0 2014-04-10 
6      2014-11-04  1333     1      204.0 2014-04-11
7      2014-04-14  1333     0       -1.0 2014-04-14 
8      2014-04-15  1333     0       -1.0 2014-04-15 
9      2014-04-16  1333     0       -1.0 2014-04-16 

I am trying to create a new column called PICKED, which selects a date from either the DATE or NEW_DATE column based on the following:

If FLAG = 1 then if NEW_DATE is a valid date then pick NEW_DATE otherwise pick DATE.

The expected output will look like:

             DATE   BBG   FLAG  DATE_DIFF   NEW_DATE     PICKED
0      2014-03-04  1333      1      -31.0 2014-04-03 2014-04-03
1      2014-04-04  1333      0      -91.0 2014-04-04 2014-04-04
2      2014-07-04  1333      1      -31.0 2014-04-07 2014-04-07
3      2014-08-04  1333      1      -31.0 2014-04-08 2014-04-08
4      2014-09-04  1333      1      -30.0 2014-04-09 2014-04-09
5      2014-10-04  1333      1      -31.0 2014-04-10 2014-04-10
6      2014-11-04  1333      1      204.0 2014-04-11 2014-04-11
7      2014-04-14  1333      0       -1.0 2014-04-14 2014-04-14
8      2014-04-15  1333      0       -1.0 2014-04-15 2014-04-15
9      2014-04-16  1333      0       -1.0 2014-04-16 2014-04-16

I am trying to use the following to achive this:

df['PICKED'] = np.where(df['FLAG'] == 1, df.NEW_DATE.fillna(df.DATE, inplace=True), df['DATE'])

But I get the following returned:

             DATE   BBG   FLAG  DATE_DIFF   NEW_DATE     PICKED
0      2014-03-04  1333      1      -31.0 2014-04-03        NaT
1      2014-04-04  1333      0      -91.0 2014-04-04 2014-04-04
2      2014-07-04  1333      1      -31.0 2014-04-07        NaT
3      2014-08-04  1333      1      -31.0 2014-04-08        NaT
4      2014-09-04  1333      1      -30.0 2014-04-09        NaT
5      2014-10-04  1333      1      -31.0 2014-04-10        NaT
6      2014-11-04  1333      1      204.0 2014-04-11        NaT
7      2014-04-14  1333      0       -1.0 2014-04-14 2014-04-14
8      2014-04-15  1333      0       -1.0 2014-04-15 2014-04-15
9      2014-04-16  1333      0       -1.0 2014-04-16 2014-04-16

Could someone let me know where I've done wrong please.

Thanks

answered question

1 Answer

11

Use -

df['PICKED'] = np.where(df['FLAG'] == 1, df.NEW_DATE.fillna(df.DATE), df['DATE'])

Output

    DATE    BBG FLAG    DATE_DIFF   NEW_DATE    PICKED
0   2014-03-04  1333    1   -31.0   2014-04-03  2014-04-03
1   2014-04-04  1333    0   -91.0   2014-04-04  2014-04-04
2   2014-07-04  1333    1   -31.0   2014-04-07  2014-04-07
3   2014-08-04  1333    1   -31.0   2014-04-08  2014-04-08
4   2014-09-04  1333    1   -30.0   2014-04-09  2014-04-09
5   2014-10-04  1333    1   -31.0   2014-04-10  2014-04-10
6   2014-11-04  1333    1   204.0   2014-04-11  2014-04-11
7   2014-04-14  1333    0   -1.0    2014-04-14  2014-04-14
8   2014-04-15  1333    0   -1.0    2014-04-15  2014-04-15
9   2014-04-16  1333    0   -1.0    2014-04-16  2014-04-16

posted this

Have an answer?

JD

Please login first before posting an answer.