I have a list of values with datetimes:
Â Datetime Val [[2017-01-01 15:00:00, 2], [2017-02-05 19:00:00, 3], [2018-04-22 15:00:00, 6], [2018-08-02 13:00:00, 3], [2018-10-03 12:00:00, 3]]
I want to group values into N number of equally spaced bins by datetime and then get a list of the cumprod of vals for each group, if a group bin is empty, the cumprod is 1.
My current approach is calculating the first and last timestamp, then using linspace to calculate the equally spaced datetime bins, this is where I'm stuck:
n = 5 # 5 equally sized bins start = pd.Timestamp(df.iloc['datetime']) end = pd.Timestamp(df.iloc[-1]['datetime']) bins = np.linspace(start.value, end.value, n+1) # n+1 as linspace is right bound including groups = pd.to_datetime(bins).values
['2017-01-01T15:00:00.000000000' '2017-05-09T14:24:00.000000000' '2017-09-14T13:48:00.000000000' '2018-01-20T13:12:00.000000000' '2018-05-28T12:36:00.000000000' '2018-10-03T12:00:00.000000000']
Output with 5 equally spaced bins and the above given example values could be for example:
output = [2*3, 1, 1, 6, 3*3] # 1 if there is no "Val" for a bin
Is there any efficient/clean way to solve this? I have looked into pd.Grouper but I can't get the freq value to work to output equally spaced datetime groups. Another solution I tried is turning datetimes into epochs, and then using np.digitize to categorize by bins. But this also didn't work out. Appreciate any help, Numpy solutions also welcome.
You can use
pd.cut to specify your bins easily. Then you need
cumprod. There might be a smart way to avoid the apply:
df.groupby(pd.cut(df.Datetime, bins=5, right=False)).apply(lambda x: x.val.cumprod().max()).fillna(1)
Datetime [2017-01-01 15:00:00, 2017-05-09 14:24:00) 6.0 [2017-05-09 14:24:00, 2017-09-14 13:48:00) 1.0 [2017-09-14 13:48:00, 2018-01-20 13:12:00) 1.0 [2018-01-20 13:12:00, 2018-05-28 12:36:00) 6.0 [2018-05-28 12:36:00, 2018-10-04 03:21:25.200000) 9.0 dtype: float64