R Dplyr: calculate difference between dataframe rows, keep the first raw value by group

3617 views r
-1

I have a dataframe with cumulative values by group that I need to recalculate back to raw values. The function lag works pretty well here, but instead of the first number in a sequence, I get back either NA, either the lag between two groups.

How to instead of NA values or difference between group get the first number in group?

My dummy data:

# make example
df <- data.frame(id = rep(1:3, each = 5),
                 hour = rep(1:5, 3),
                 value = sample(1:15))

Calculate first cumulative values, than convert it back to row values. I.e value should equal to valBack:

df %>%
  group_by(id) %>%
  dplyr::mutate(cumsum = cumsum(value)) %>% 
  mutate(valBack = cumsum - lag(cumsum))

Which results:

# A tibble: 15 x 5
# Groups:   id [3]
      id  hour value cumsum valBack
   <int> <int> <int>  <int>   <int>
 1     1     1    13     13      NA     # should be 13
 2     1     2    11     24      11
 3     1     3     5     29       5
 4     1     4     4     33       4
 5     1     5     2     35       2
 6     2     1    14     14     -21      # should be 14
 7     2     2     7     21       7
 8     2     3     1     22       1
 9     2     4    12     34      12
10     2     5     9     43       9
11     3     1     3      3     -40
12     3     2    15     18      15       # should be 15
13     3     3     8     26       8
14     3     4    10     36      10
15     3     5     6     42       6

I want to a safe calculation to make my valBack equal to value. (Of course, in real data I don't have value column, just cumsum column)

answered question
Add a Comment

1 Answer

8

Try:

df %>%
  group_by(id) %>%
  dplyr::mutate(cumsum = cumsum(value)) %>% 
  mutate(valBack = c(cumsum[1], (cumsum - lag(cumsum))[-1]))

posted this

Please login first before posting an answer.