正确填写与获取下载文件名

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› virtualenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› Pyflakes

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

这是一个创建于 1542 天前的主题，其中的信息可能已经有所发展或是发生改变。

我用 requests 下载文件，文件名在 response header 的 Content-Disposition 中，在浏览器中虽然 devtools 解析乱码，但下载弹框的文件名正确，而在程序中获取到的内容也是乱码，类似于

attachment; filename=å½±å® ã·ã£ãã¼ãã¦ã¹ 01;filename*=utf-8''å½±å® ã·ã£ãã¼ãã¦ã¹ 01

经过尝试，可能是进行了如下错误解码

    raw = '影宅 シャドーハウス 01'
    b = bytes(raw, 'utf8')
    dec = b.decode(encoding='iso-8859-1', errors='replace')
    print(dec)

output:

å½±å® ã·ã£ãã¼ãã¦ã¹ 01

逆向解码：

print(bytes(dec, 'iso-8859-1').decode('utf8'))

按规范来说，接收方不保证能解析 utf8 header，所以各位开发同学请不要再在 header 中填入 utf8 内容了！

4 条回复 • 2022-11-07 15:15:46 +08:00

swulling

2021-06-18 13:03:25 +08:00 via iPhone

嗯，放 utf8 不符合规范，标准做法是 URL-encode

Kobayashi

2021-06-18 14:11:39 +08:00

"attachment; filename*=utf-8''{}".format(quote(filename))

- filename* 而不是 filename
- utf-8 后 2 个单引号不是打字错误
- quote() 为 urllib.parse.quote

纯 ascii 字符使用 filename=，否则 filename*=

完整代码参考 Starlette FileResponse 对象

https://github.com/encode/starlette/blob/15761fb48e4c56be09167cb8f9b761114593b651/starlette/responses.py#L257-L265

Kobayashi

2021-06-18 14:17:39 +08:00

当然并不是所有人都按规矩办事。规范中 header 编码是 latin-1/iso-8859-1，对方服务器可能使用 utf-8 编码 header，而浏览器默认使用 iso-8859-1 解码有乱码。所以有了 encode('iso-8859-1').decode('utf-8').

以前在腾讯乐享的 Set-Cookie 见过这种情况。Chrome 旧版会尝试使用 utf-8 解码，新版遵循了规范只使用 iso-8859-1.

coffeeing

2022-11-07 15:15:46 +08:00

@Kobayashi "attachment; filename*=utf-8''{}".format(quote(filename))
打扰，麻烦问下，这个代码是写在 http 的 head 里，还是服务端的配置文件里？谢谢谢谢