Pandas Groupby Dates, then Cumprod of Group?

3509 views python
7

I have a list of values with datetimes:

     Datetime         Val 
[[2017-01-01 15:00:00, 2],
 [2017-02-05 19:00:00, 3],
 [2018-04-22 15:00:00, 6],
 [2018-08-02 13:00:00, 3],
 [2018-10-03 12:00:00, 3]]

I want to group values into N number of equally spaced bins by datetime and then get a list of the cumprod of vals for each group, if a group bin is empty, the cumprod is 1.

My current approach is calculating the first and last timestamp, then using linspace to calculate the equally spaced datetime bins, this is where I'm stuck:

n = 5 # 5 equally sized bins
start = pd.Timestamp(df.iloc[0]['datetime'])
end = pd.Timestamp(df.iloc[-1]['datetime'])
bins = np.linspace(start.value, end.value, n+1) # n+1 as linspace is right bound including
groups = pd.to_datetime(bins).values

Returns:

 ['2017-01-01T15:00:00.000000000' '2017-05-09T14:24:00.000000000'
 '2017-09-14T13:48:00.000000000' '2018-01-20T13:12:00.000000000'
 '2018-05-28T12:36:00.000000000' '2018-10-03T12:00:00.000000000']

Output with 5 equally spaced bins and the above given example values could be for example:

 output = [2*3, 1, 1, 6, 3*3] # 1 if there is no "Val" for a bin

Is there any efficient/clean way to solve this? I have looked into pd.Grouper but I can't get the freq value to work to output equally spaced datetime groups. Another solution I tried is turning datetimes into epochs, and then using np.digitize to categorize by bins. But this also didn't work out. Appreciate any help, Numpy solutions also welcome.

answered question

I have a feeling this isn't quite what you want, but what about something like df.resample('10D').prod().replace(0, 1)?

1 Answer

4

You can use pd.cut to specify your bins easily. Then you need groupby + cumprod. There might be a smart way to avoid the apply:

df.groupby(pd.cut(df.Datetime, bins=5, right=False)).apply(lambda x: x.val.cumprod().max()).fillna(1)

### Output:

Datetime
[2017-01-01 15:00:00, 2017-05-09 14:24:00)           6.0
[2017-05-09 14:24:00, 2017-09-14 13:48:00)           1.0
[2017-09-14 13:48:00, 2018-01-20 13:12:00)           1.0
[2018-01-20 13:12:00, 2018-05-28 12:36:00)           6.0
[2018-05-28 12:36:00, 2018-10-04 03:21:25.200000)    9.0
dtype: float64

posted this

Have an answer?

JD

Please login first before posting an answer.