Source file encoding
In a
shell, we create three simple Python source files that are supposed to
print the word
Café (note the accent):
echo -e 'print("Caf\xc3\xa9")' > utf-8.py
echo -e 'print("Caf\xe9")' > latin-1.py
echo -e '# coding:latin-1\nprint("Caf\xe9")' > latin-1-coding.py
In
utf-8.py
, the accent is encoded in
UTF-8, i. e. as hex c3 a9.
latin-1.py
encodes the character in latin-1: e9.
latin-1-coding.py
also encodes the character in latin-1, but additionally uses the
coding directive to specify the source file's encoding (See
PEP-263)
Python3 expects source files to be encoded in Python 3 (PEPs 686 and 3120), so running utf-8.py
prints Café as epxected:
$ python3 utf-8.py
Café
Running latin-1.py
throws an error:
$ python3 latin-1.py
SyntaxError: Non-UTF-8 code starting with '\xe9' in file /home/rene/notes/test/expected/sub/sub-sub/latin-1.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
With the coding directive, a latin-1 encoded source file can be executed:
$ python3 latin-1-coding.py
Café
See also
PEP 597
The -X warn_default_encoding
option
The PYTHONWARNDEFAULTENCODING
enviornment variable.
sys.flags.warn_default_encoding