pandas column operation on certain row in succession

2434 views sql
3

I have a panda dataframe like this:

    second          block
0   1               a
1   2               b
2   3               c
3   4               a
4   5               c

This is a sequential data and I would like to get a new column which is the time difference between the current block and next time it repeats.

    second          block     freq
0   1               a         3 //(4-1)
1   2               b         0 //(not repeating)
2   3               c         2 //(5-3)
3   4               a         0 //(not repeating)
4   5               c         0 //(not repeating)

I have tried to get the unique list of blocks. Then a for loop that do as below.

for i in unique_block:
    df['freq'] = df['timestamp'].shift(-1) - df['timestamp']

I do not know how to get 0 for row index 1,3,4 and since the dataframe is too big. This is not efficient. This is not working.

Thanks.

answered question

3 Answers

7

Use groupby + diff(periods=-1). Multiply by -1 to get your difference convention and fillna with 0.

df['freq'] = (df.groupby('block').diff(-1)*-1).fillna(0)

   second block  freq
0       1     a   3.0
1       2     b   0.0
2       3     c   2.0
3       4     a   0.0
4       5     c   0.0

posted this
10

You can use shift and transform in your groupby:

df['freq'] = df.groupby('block').second.transform(lambda x: x.shift(-1) - x).fillna(0)
>>> df
   second block  freq
0       1     a   3.0
1       2     b   0.0
2       3     c   2.0
3       4     a   0.0
4       5     c   0.0

posted this
10

Using

df.groupby('block').second.apply(lambda x : x.diff().shift(-1)).fillna(0)
Out[242]: 
0    3.0
1    0
2    2.0
3    0
4    0
Name: second, dtype: float64

posted this

Have an answer?

JD

Please login first before posting an answer.