Lets say I have a dataset like this with timestamp and userid.
I want to create a "session" variable in such a way that I can specify a time (1 min or 2min) and for each userid if the next time within a user id is within this time (1 or 2 min or so)then both are recorded as same session. Basically I look at the first time and then calculate the diff of next time and if within 1 min then same session. Similary if session changes then we take that new session time as base time and calculate all subsequent visits time with respect to that new session time.
I want this time_frame to be like a variable which one can play with and not hardcoded.
I can do this in sql with window function. was wondering how to do this in pandas.
time company_id 2018-10-23 00:01:23 113141P 2018-10-23 00:01:29 113141P 2018-10-23 00:07:37 113141P 2018-10-23 00:22:23 113141P 2018-10-23 00:23:10 113141P
You can use
df['session'] = (df.groupby('company_id')['time'] .transform(lambda x: (x.diff() > '00:02:00') .cumsum())) >>> df time company_id session 0 2018-10-23 00:01:23 113141P 0 1 2018-10-23 00:01:29 113141P 0 2 2018-10-23 00:07:37 113141P 1 3 2018-10-23 00:22:23 113141P 2 4 2018-10-23 00:23:10 113141P 2