html 文件以 utf-8 编码储存了 \u6211\u5728\u5317\u4eac 这样字符，现在需要在 Python 中将这些解码为 utf-8 ，应该怎么做？

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› virtualenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› Pyflakes

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

这是一个创建于 4587 天前的主题，其中的信息可能已经有所发展或是发生改变。

utf

u6211

u5728

4 条回复 • 1970-01-01 08:00:00 +08:00

fengluo

2012-06-13 13:45:43 +08:00

print u'\u6211\u5728\u5317\u4eac'.encode('utf-8')

INT21H

2012-06-13 14:02:00 +08:00

@fengluo 问题是是个 html 文件，基本上都是这种 href=\"javascript:void(0);\">\u8f6c\u53d1<\/a> ，所以需要先 re 匹配到 \uXXXX 然后才能 encode ，该怎么做呢。。

cute

2012-06-13 14:07:18 +08:00

'\u6211\u5728\u5317\u4eac'.decode('raw_unicode_escape')

INT21H

2012-06-13 14:11:25 +08:00

@cute 十分感谢！