Writing DataFrame to encoded JSON Newline Delimited

3398 views python
2

In Python 2.7, I have a Pandas Dataframe with several unicode columns, integer columns, etc. I need to be able to write it encoded utf-8 to JSON Newline Delimited file.

I tried this, but it only works in Python 3, not Python 2.7.

with io.open('myjson.json','w',encoding='utf-8') as f:
    f.write(df.to_json(orient="records", lines=True, force_ascii=False))

This is my attempt's result, but as you can see it's not encoded utf-8.

{"account_id":"support","case_id":7697,"message":"\u0633\u0628 \u0627\u0644\u0644\u0647\u0627\u0644\u0644\u0647 \u0627\u0644\u0639","created_at":1536606086392,"agent":"108915"} 
{"account_id":"support","case_id":7697924,"message":"\u0647\u0627\u064a","created_at":1536601516354,"agent":"108915"}

I think it has something to do with this. But I'm not sure.

Other research I've done shows that if I put this in my code it works. But I also read that this isn't recommended.

import sys
reload(sys)  
sys.setdefaultencoding('utf8')

answered question

1 Answer

1

It looks like pandas .to_json() has a default setting of ensure_ascii=True, which converts non ascii to Unicode.

From docs:

to_json(path_or_buf=None, orient=None, date_format=None, double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False, compression=None, index=True)

Try setting it to False:

df.to_json(force_ascii=False)
'{"agent":{"0":"108915"},"created_at":{"0":1536606086392},"message":{"0":"?? ???????? ???"}}'

posted this

Have an answer?

JD

Please login first before posting an answer.