# Pandas Groupby Dates, then Cumprod of Group?

I have a list of values with datetimes:

```
Â Datetime Val
[[2017-01-01 15:00:00, 2],
[2017-02-05 19:00:00, 3],
[2018-04-22 15:00:00, 6],
[2018-08-02 13:00:00, 3],
[2018-10-03 12:00:00, 3]]
```

I want to group values into N number of equally spaced bins by datetime and then get a list of the cumprod of vals for each group, if a group bin is empty, the cumprod is 1.

My current approach is calculating the first and last timestamp, then using linspace to calculate the equally spaced datetime bins, this is where I'm stuck:

```
n = 5 # 5 equally sized bins
start = pd.Timestamp(df.iloc[0]['datetime'])
end = pd.Timestamp(df.iloc[-1]['datetime'])
bins = np.linspace(start.value, end.value, n+1) # n+1 as linspace is right bound including
groups = pd.to_datetime(bins).values
```

Returns:

```
['2017-01-01T15:00:00.000000000' '2017-05-09T14:24:00.000000000'
'2017-09-14T13:48:00.000000000' '2018-01-20T13:12:00.000000000'
'2018-05-28T12:36:00.000000000' '2018-10-03T12:00:00.000000000']
```

Output with 5 equally spaced bins and the above given example values could be for example:

```
output = [2*3, 1, 1, 6, 3*3] # 1 if there is no "Val" for a bin
```

Is there any efficient/clean way to solve this? I have looked into pd.Grouper but I can't get the freq value to work to output equally spaced datetime groups. Another solution I tried is turning datetimes into epochs, and then using np.digitize to categorize by bins. But this also didn't work out. Appreciate any help, Numpy solutions also welcome.

### 1 Answer

You can use `pd.cut`

to specify your bins easily. Then you need `groupby`

+ `cumprod`

. There might be a smart way to avoid the apply:

```
df.groupby(pd.cut(df.Datetime, bins=5, right=False)).apply(lambda x: x.val.cumprod().max()).fillna(1)
```

### Output:

```
Datetime
[2017-01-01 15:00:00, 2017-05-09 14:24:00) 6.0
[2017-05-09 14:24:00, 2017-09-14 13:48:00) 1.0
[2017-09-14 13:48:00, 2018-01-20 13:12:00) 1.0
[2018-01-20 13:12:00, 2018-05-28 12:36:00) 6.0
[2018-05-28 12:36:00, 2018-10-04 03:21:25.200000) 9.0
dtype: float64
```

I have a feeling this isn't quite what you want, but what about something like

`df.resample('10D').prod().replace(0, 1)`

?