How to create a session variable based on time variable in pandas

2236 views python
7

Lets say I have a dataset like this with timestamp and userid.

I want to create a "session" variable in such a way that I can specify a time (1 min or 2min) and for each userid if the next time within a user id is within this time (1 or 2 min or so)then both are recorded as same session. Basically I look at the first time and then calculate the diff of next time and if within 1 min then same session. Similary if session changes then we take that new session time as base time and calculate all subsequent visits time with respect to that new session time.

I want this time_frame to be like a variable which one can play with and not hardcoded.

I can do this in sql with window function. was wondering how to do this in pandas.

time company_id
    2018-10-23 00:01:23 113141P
    2018-10-23 00:01:29 113141P
    2018-10-23 00:07:37 113141P
    2018-10-23 00:22:23 113141P
    2018-10-23 00:23:10 113141P

answered question

1 Answer

1

You can use transform with diff and cumsum:

df['session'] = (df.groupby('company_id')['time']
                 .transform(lambda x: (x.diff() > '00:02:00')
                            .cumsum()))

>>> df
                 time company_id  session
0 2018-10-23 00:01:23    113141P        0
1 2018-10-23 00:01:29    113141P        0
2 2018-10-23 00:07:37    113141P        1
3 2018-10-23 00:22:23    113141P        2
4 2018-10-23 00:23:10    113141P        2

posted this

Have an answer?

JD

Please login first before posting an answer.