Using R to find the start difference of two strings

3979 views r
7

I'm trying to use R to find the start difference of two strings, i.e. from which letter these two strings become different, and hope the function can give me the location number. The function always give the value 2, and seems the loop only runs one time.

Here is my code:

string1 = "CGCGGTGCATCCTGGGAGTTGTAGTTTTTTCTACTCAGAGGGAGAATAGCTCCAGACGGGAGCAGGATGA"
string2 = "CGCGGTGCATCCTGGGATGTAGTTTTTTCTACTCAGAGGGAGAATAGCTCCAGACGGGAGCAGGATGA"

location <- function(string1, string2){
  len1 = nchar(string1)
  len2 = nchar(string2)
  len = max(len1, len2)
  score = 1
  i = 1
  if (i <= len){
     if (substring(string1, i, i) == substring(string2, i, i)){
     score = score + 1
     i = i + 1
   }
  else if (substring(string1, i, i) != substring(string2, i, i)){
  break
   }
 }
  return(score)
}

location(string1, string2)

Thank you very much!

answered question

1 Answer

10

We can split the string and compare character by character and get the first mismatch using which.min

which.min(strsplit(string1, "")[[1]] == strsplit(string2, "")[[1]])
#[1] 18

The above method returns a warning message when nchar(string1) is not equal to nchar(string2)

Warning message: In strsplit(string1, "")[[1]] == strsplit(string2, "")[[1]] : longer object length is not a multiple of shorter object length

Most of the cases it would be fine to ignore this message, it would still give you correct answer because

However, to make it complete and reliable we can write a function

location <- function(string1, string2) {
  n = pmin(nchar(string1), nchar(string2))
  i = 1
  while (i <= n) {
    if (substr(string1, i, i) != substr(string2, i, i)) 
       return(i)
    i = i + 1
  }
 cat("There is no difference between two strings")
}

location(string1, string2)
#[1] 18

location("Ronak", "Shah")
#[1] 1

location("Ronak", "Ronak")
#There is no difference between two strings

posted this

Have an answer?

JD

Please login first before posting an answer.