Python 编码和系统编码问题。

来结个贴。。
以前确实没太仔细研究编码和 export 出得 LC_*系列参数昨晚仔细 Google 了一遍。

1 、查看系统默认编码 sys.getdefaultencoding( ) 一般为 ascii

2 、在终端获取系统的输入、输出编码格式 sys.stdin.encoding sys.stdout.encoding 正常应该为 utf-8 设置方法为 export PYTHONIOENCODING=UTF-8

3 、 u ’中文’=‘中文’.decode(encode)
此处 encode 值为 sys.stdin.encoding
所以当为 utf-8 时 '中文'.decode('utf-8 ’)=u'\u4e2d\u6587 ’
当为 ASCII 时 '中文'.decode('ISO-8859-1 ’)=u'\xe4\xb8\xad\xe6\x96\x87 ’

4 、 os.path.exists(path) 当 path 里有中文路径时，尽量转成 utf-8 后再和英文路径相加

5 、 print 输出时候尽量要 encode(‘ utf-8 ’)

https://wiki.archlinux.org/index.php/Locale_(简体中文)

https://segmentfault.com/a/1190000004357933

http://www.w2bc.com/article/216391

http://stackoverflow.com/questions/2596714/why-does-python-print-unicode-characters-when-the-default-encoding-is-ascii

http://blog.csdn.net/liuyukuan/article/details/50855748

x91

lc_all

xe5

xc2

8 replies • 2017-04-16 04:19:51 +08:00

zhihaofans

Apr 15, 2017 via iPhone

→python3

coolair

Apr 15, 2017 via Android

.decode('gbk')

Apr 15, 2017

我认为是你输入的值有了问题，不然你看看 len(i) 是怎样？

wwqgtxx

Apr 15, 2017 via iPhone

快转换到 python3 吧，别在编码问题上死磕了

SuT2i

Apr 15, 2017

Python3 没有这些问题。。

dant

Apr 15, 2017

LC_ALL=C 时， Python 不知道你输入的字面量是什么编码，于是默认 ISO-8859-1 。
encode 的时候，就按 ISO-8859-1 -> UTF-8 的规则转换了。

dant

Apr 15, 2017

纠正一下，是解析 u'呵呵' 的时候把 “呵呵” 的 UTF-8 表示（ E5 91 B5 E5 91 B5 ）当作 ISO-8859-1 编码转换为 Unicode codepoint 序列（ U+00E5 U+0091 U+00B5 U+00E5 U+0091 U+00B5 ）了.
encode 的时候，就是把上面提到的那个 Unicode codepoint 序列编码成 UTF-8

lzjun

Apr 16, 2017

编码问题看： https://foofish.net/why-python-encoding-is-tricky.html