Remove special character from corpus

2199 views r
8

I built a data that shows all the terms with punctuation and its frequency. Then im supposed to remove the punctuation's from them and check if there is any punctuation remaining.

newpapers1 <- tm_map(newpapers, removePunctuation)

my.check.func <- function(x){str_extract_all(x, "[[:punct:]]")}
my.check1 <- lapply(newpapers1, my.check.func)
p <- as.data.frame(table(unlist(my.check1)))
p

But I still end up with this special character:

  Var1 Freq
1    ¡   25

Is there a way to write a function to remove all the punctuation's together or a function to remove this?

Thank You!

answered question

1 Answer

5

You can use gsub to remove the punctuation, like this.

newpapers1 <- tm_map(newpapers, removePunctuation)

my.check.func <- function(x){gsub('[[:punct:]]+','',x)}
my.check1 <- lapply(newpapers1, my.check.func)
p <- as.data.frame(table(unlist(my.check1)))
p

Hope this helps.

posted this

Have an answer?

JD

Please login first before posting an answer.