Fixing Python Unicode Errors

Python has a lot of issues handling Unicode (there seem to be backwards compatibility issues – https://en.wikipedia.org/wiki/History_of_Python).

One common error you get will look like this:

Traceback (most recent call last):
  File "D:\Software\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 777-778: character maps to 

If this occurs while you are doing an HTTP request, do this:

httpresponse = urlopen(url).read().decode('utf8')
response = json.loads(httpresponse)

If this occurs while you are reading a file in Beautiful soup, do this (the ‘rb’ triggers binary mode):

soup = BeautifulSoup(open(file, 'rb'), 'html.parser')

If this occurs in a print statement, like so, you may need to start logging to a file1.

print u'\u0420\u043e\u0441\u0441\u0438\u044f'

Interested in Python? I send out weekly, personalized emails with articles and conference talks. Click here to see an example and subscribe.

Citations:
  1. http://stackoverflow.com/questions/10569438/how-to-print-unicode-character-in-python []
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *