removing a list of words from a dataframe

534 views python
1

I have a data frame made of data series containing strings. I have a list of strings that I wish to be removed from each row.

tcl_list = ["tab", "cr", "lf", "doublequote", "singlequote", "eof"]
df[['Summary', 'Description']] = re.sub("|".join(tcl_list), ' ', df[['Summary', 'Description']])

For example:

From this:

the tab dog is acting sneaky like a doublequote cat doublequote

To this:

the dog is acting sneaky like a cat

However, I get this error:

TypeError: expected string or bytes-like object

I have tried using the apply() and lambda functions but am unsuccessful. Any suggestions?

answered question

This usually happens when some of the values are not strings. A quick way to check would be casting everything to String.

Try converting it to a raw string via val = r'%s' % your_value.

1 Answer

7

i think regular expression needs to apply on individual string of column

df['val'] = ['the tab dog is acting sneaky like a doublequote cat doublequote']

df.val.apply(lambda x: re.sub("|".join(tcl_list),'',x))

Or

df.val.str.replace("|".join(tcl_list),'')

Out:

0    the  dog is acting sneaky like a  cat 
Name: val, dtype: object

posted this

Have an answer?

JD

Please login first before posting an answer.