Shorten tibble/df by remove duplicant entries inside tidyverse

2945 views r
-1

i have a very big dataframe from which i need the lossyear per Point:

# A tibble: 74,856 x 13
   Date       index    Mean   Sdev  Median pixel_used   doy Month Year_n  Year lossyear Point Scene      
   <date>     <chr>   <dbl>  <dbl>   <dbl>      <int> <int> <int>  <dbl> <int>    <int> <int> <chr>      
 1 2013-06-11 NBR    0.481  0.0832  0.496       92647   162     6   2013  2013     2017     1 LC08_125016
 2 2013-06-11 NDMI   0.175  0.0737  0.189       92647   162     6   2013  2013     2017     1 LC08_125016
 3 2013-06-11 NDVI   0.734  0.0517  0.741       92647   162     6   2013  2013     2017     1 LC08_125016
 4 2013-06-11 TCB    0.237  0.0159  0.235       92647   162     6   2013  2013     2017     1 LC08_125016
 5 2013-06-11 TCG    0.158  0.0174  0.158       92647   162     6   2013  2013     2017     1 LC08_125016
 6 2013-06-11 TCW   -0.0958 0.0195 -0.0903      92647   162     6   2013  2013     2017     1 LC08_125016
 7 2013-06-27 NBR    0.524  0.0503  0.525       39323   178     6   2013  2013     2017     1 LC08_125016
 8 2013-06-27 NDMI   0.234  0.0464  0.236       39323   178     6   2013  2013     2017     1 LC08_125016
 9 2013-06-27 NDVI   0.721  0.0351  0.725       39323   178     6   2013  2013     2017     1 LC08_125016
10 2013-06-27 TCB    0.249  0.0299  0.251       39323   178     6   2013  2013     2017     1 LC08_125016
# ... with 74,846 more rows

I was able to create a subset by row df[,c("lossyear", "Point")]:

# A tibble: 74,856 x 2
   Point lossyear
   <fct> <fct>   
 1 1     2017    
 2 1     2017    
 3 1     2017    
 4 1     2017    
 5 1     2017    
 6 1     2017    
 7 1     2017    
 8 1     2017    
 9 1     2017    
10 1     2017    
# ... with 74,846 more rows

But how do i "shorten" it, so that i have only 1 Row per unique Point which the corresponding lossyear (2000:2017)? Something like this:

# A tibble: 42 x 2
   Point lossyear
   <fct> <fct>   
 1 1     2017    
 2 2     2017    
 3 3     2017    
 4 4     2016    
 5 5     2016    
 6 6     2016    
 7 7     2015    
 8 8     2014    
 9 9     2014    
10 10    2014    
# ... with 32 more rows

answered question

Please use dput to show a small example

dput() on a 74k longformat dataframe, how? would`nt that be too big?

2 Answers

13

You could group by Pointand get the first value via slice:
library(dplyr) df %>% select(lossyear, Point) %>% group_by(Point) %>% slice(1) %>% ungroupt

posted this
4

We can use distinct to get the unique elements of the selected columns

library(dplyr)
df %>% 
   distinct(lossyear, Point)

posted this

Have an answer?

JD

Please login first before posting an answer.