Splitting a column into multiple columns based on 2 conditions

2556 views r
5

I have a large dataframe and I would like to split a column into many columns based on two conditions the caret character ^ and the letter following IMM-. Based on the data below Column 1 would be split into columns named IMM-A, IMM-B, IMM-C, and IMM-W. I tried the separate function but it only works if you specify the column names and because my data is not uniform I don't always know what the column names should be.

SampleId  Column1
1         IMM-A*010306+IMM-A*0209^IMM-B*6900+IMM-B*779999^IMM-C*1212+IMM-C*3333
2         IMM-A*010306+IMM-A*0209^IMM-C*6900+IMM-C*779999^IMM-W*1212+IMM-W*3333
3         IMM-B*010306+IMM-B*0209^IMM-C*6900+IMM-C*779999^IMM-W*1212+IMM-W*3333

answered question

Can you show the expected output

2 Answers

10

Here is one option using strsplit. We can try splitting on the following pattern:

\*\d+[+^]?

Code sample:

x <- "IMM-A*010306+IMM-A*0209^IMM-B*6900+IMM-B*779999^IMM-C*1212+IMM-C*3333"
unlist(strsplit(x, "\\*\\d+[+^]?"))

[1] "IMM-A" "IMM-A" "IMM-B" "IMM-B" "IMM-C" "IMM-C"

posted this
10

We may need

strsplit(df$Column1, "[*+^]")

posted this

Have an answer?

JD

Please login first before posting an answer.