R - Pairs of values in a single vector: How to detect missing values?

3016 views r
-1

I have a long vector comprised of pairs of values; years paired to scores. The number of characters in each value is always the same (4 character for years, 3 characters for scores).

data <- c("2018", "5.5", "2016", "8.4", "2017", "6.6", "2018", "2017", "5.5", 
"2009", "7.9")

The problem is that some of the scores are missing, while all of the years are present:

matrix(data, ncol = 2, byrow = T)

[,1]   [,2]  
[1,] "2018" "5.5" 
[2,] "2016" "8.4" 
[3,] "2017" "6.6" 
[4,] "2018" "2017"
[5,] "5.5"  "2009"
[6,] "7.9"  "2018"

This way I can't structure the data by converting it to a matrix or dataframe as the pairs of values are shifted.

Is there a way detect when a mismatch takes place ie. a year is followed by another year and insert an NA in between the two values?

answered question

1 Answer

13

Sure, here's a pretty compact way:

idx <- which(nchar(data) == 4)
cbind(Year = data[idx], Score = ifelse(nchar(data[idx + 1]) == 3, data[idx + 1], NA))
#      Year   Score
# [1,] "2018" "5.5"
# [2,] "2016" "8.4"
# [3,] "2017" "6.6"
# [4,] "2018" NA   
# [5,] "2017" "5.5"
# [6,] "2009" "7.9"

where using nchar and your information on the lengths is key.

posted this

Have an answer?

JD

Please login first before posting an answer.