Why python 2.7 on Windows need a space before unicode character when print? -

- June 15, 2012

i use cmd windows, chcp 65001, code:

print u'\u0110 \u0110' + '\n'

result:

 (a character cmd can't display) (character want)  traceback (most recent call last):       file "b.py", line 26, in <module>         print u'\u0110 \u0110'     ioerror: [errno 2] no such file or directory

but, when use code:

print u' \u0110 \u0110' + '\n'

result:

(a space)(charecter want) (character want) traceback (most recent call last):   file "b.py", line 26, in <module>     print u' \u0110 \u0110' + '\n' ioerror: [errno 2] no such file or directory

my screen: enter image description here

and question is:

why python 2.7 need space when print unicode character?
how fix ioerror: [errno 2]

short answer

on windows can't print arbitrary strings using print.

there workarounds, shown here: how make python 3 print() utf8. but, despite title of question, can't use print utf-8 using code page 65001, repeat last few bytes after finishing (as described further down)

example:

#! python2 import sys  enc = sys.stdout.encoding  def outputunicode(t):     bytes = t.encode(enc, 'replace')     sys.stdout.write(bytes)  outputunicode(u'the letter \u0110\n')

long answer

you can change code page of console using chcp code page contains characters want print. in case instance, run chcp 852.

these results on box if print following strings. i'm using code page 850, default english systems:

u"\u00abhello\u00bb"  # "«hello»"  u"\u0110"  # "Đ" u"\u4f60\u597d"  # "你好" u"a\u2192b\u2192c"  # "a→b→c"

the first command work, since characters in code page 850. next 3 fail.

unicodeencodeerror: 'charmap' codec can't encode character u'\u0110' in position 0: character maps <undefined>

change code page 852 , second command work.

there utf-8 code page (65001) doesn't work python 2.7.

in python 3.4 results same. if change code page 65001 you'll less broken behaviour.

\python34\python.exe -c "print(u'a\u2192b\u2192c')" a→b→c �c c:\>

the 2 characters (�c) consequence of non-standard behaviour in c standard library on windows. they're repeat of last 2 bytes in utf-8 encoding of string.

Search This Blog

Yet