Default value in column when using lag function in hive/sql

3223 views sql
6

I have a table like below in Hive.

I want to calculate the time difference in seconds for columns where id is same and get the value in time_diff column.

Table

+-----+---------+------------------------+
| id  |  event  |            eventdate   |
+-----+---------+------------------------+
| 1   | sent    | 2017-11-23 03:49:50.0  |
| 1   | sent    | 2017-11-23 03:49:59.0  |
| 2   | sent    | 2017-11-23 04:49:59.0  |
| 1   | click   | 2017-11-24 03:49:50.0  |
+-----+---------+------------------------+

I have done like below

SELECT *, coalesce(unix_timestamp(eventdate) - unix_timestamp(LAG(eventdate) OVER(PARTITION BY ID ORDER BY eventdate)),0) time_diff FROM Table;

Result

+-----+---------+------------------------+-----------+
| id  |  event  |            eventdate   |time_diff  |
+-----+---------+------------------------+-----------+
| 1   | sent    | 2017-11-23 03:49:50.0  | 0         |
| 1   | sent    | 2017-11-23 03:49:59.0  | 9         |
| 2   | sent    | 2017-11-23 04:49:59.0  | 0         |
| 1   | click   | 2017-11-24 03:49:50.0  | 86391     |
+-----+---------+------------------------+-----------+

I am getting what I want but with a small exception. In the result where id is 1 and event is sent in the time_diff column there are two values 0 and 9. I want all the sent events to have 0 in the time_diff column after we apply the lag function.

Expected result:

+-----+---------+------------------------+-----------+
| id  |  event  |            eventdate   |time_diff  |
+-----+---------+------------------------+-----------+
| 1   | sent    | 2017-11-23 03:49:50.0  | 0         |
| 1   | sent    | 2017-11-23 03:49:59.0  | 0         |
| 2   | sent    | 2017-11-23 04:49:59.0  | 0         |
| 1   | click   | 2017-11-24 03:49:50.0  | 86391     |
+-----+---------+------------------------+-----------+

How can I get the expected result?

answered question

1 Answer

1

You can use a case expression:

SELECT *,
       (case when event = 'sent' then 0
             else coalesce(unix_timestamp(eventdate) - unix_timestamp(LAG(eventdate) OVER(PARTITION BY ID ORDER BY eventdate)), 0)
        end) as time_diff 
FROM Table;

posted this

Have an answer?

JD

Please login first before posting an answer.