How to read file in Python withou \ufef?

3418 views python
7

My code

lines=[]
with open('biznism.txt') as outfile:
    for line in outfile:
        line = line.strip()
        lines.append(line)

This is what I have in my Jupyter notebook

["\ufeffIf we are all here, let's get started. First of all, I'd like you to please join me in welcoming Jack Peterson, our Southwest Area Sales Vice President.",
 "Thank you for having me, I'm looking forward to today's meeting.",
 "I'd also like to introduce Margaret Simmons who recently joined our team.",
 'May I also introduce my assistant, Bob Hamp.',
 "Welcome Bob. I'm afraid our national sales director, Anne Trusting, can't be with us today. She is in Kobe at the moment, developing our Far East sales force.",

I will use file content for text analytics,this \ufeff will make a hell of a mess. How to get rid of it?

answered question

U+FEFF is ZERO WIDTH NO-BREAK SPACE, decimal: 65279, HTML: No visual representation, UTF-8: 0xEF 0xBB 0xBF, block: Arabic Presentation Forms-B, that means, that this symbol is in the file. You can either delete it manually or use regex to ignore non-printable chars

1 Answer

8

You should use the correct encoding to open the file, for example:

with open('biznism.txt', encoding='utf-8-sig') as outfile:

or

with open('biznism.txt', encoding='utf-16') as outfile:

posted this

Have an answer?

JD

Please login first before posting an answer.