# pandas column operation on certain row in succession

2434 views
3

I have a panda dataframe like this:

``````    second          block
0   1               a
1   2               b
2   3               c
3   4               a
4   5               c
``````

This is a sequential data and I would like to get a new column which is the time difference between the current block and next time it repeats.

``````    second          block     freq
0   1               a         3 //(4-1)
1   2               b         0 //(not repeating)
2   3               c         2 //(5-3)
3   4               a         0 //(not repeating)
4   5               c         0 //(not repeating)
``````

I have tried to get the unique list of blocks. Then a for loop that do as below.

``````for i in unique_block:
df['freq'] = df['timestamp'].shift(-1) - df['timestamp']
``````

I do not know how to get 0 for row index 1,3,4 and since the dataframe is too big. This is not efficient. This is not working.

Thanks.

7

Use `groupby` + `diff(periods=-1)`. Multiply by `-1` to get your difference convention and fillna with 0.

``````df['freq'] = (df.groupby('block').diff(-1)*-1).fillna(0)

second block  freq
0       1     a   3.0
1       2     b   0.0
2       3     c   2.0
3       4     a   0.0
4       5     c   0.0
``````

posted this
10

You can use `shift` and `transform` in your groupby:

``````df['freq'] = df.groupby('block').second.transform(lambda x: x.shift(-1) - x).fillna(0)
>>> df
second block  freq
0       1     a   3.0
1       2     b   0.0
2       3     c   2.0
3       4     a   0.0
4       5     c   0.0
``````

posted this
10

Using

``````df.groupby('block').second.apply(lambda x : x.diff().shift(-1)).fillna(0)
Out:
0    3.0
1    0
2    2.0
3    0
4    0
Name: second, dtype: float64
``````

posted this