为何curl取不到内容？

2012-07-16 16:11:40 +08:00

bigdude

各位试试这个 curl "http://brand.tmall.com/azIndexInside.htm?firstLetter=A&prt=1342414752421&prc=5" 能否取到内容。

初步研究貌似跟referer、useragent等无关。

3997 次点击

所在节点

问与答

7 条回复

yujnln

2012-07-16 16:21:48 +08:00

可以。
>>> print len(content)
87031

bigdude

2012-07-16 16:29:33 +08:00

@yujnln 你用的python？我用urllib2老是告诉我
urllib2.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Moved Temporarily

yujnln

2012-07-16 16:32:51 +08:00

@bigdude 是的。
你可以参考下这里 http://stackoverflow.com/questions/9113652/how-do-i-set-cookies-using-python-urlopen

bigdude

2012-07-16 16:34:49 +08:00

抓狂·······
>>> a=urllib.urlopen('http://brand.tmall.com/azIndexInside.htm?firstLetter=A&prt=1342414752421&prc=5')
>>> len(a.read())
0

bigdude

2012-07-16 16:38:59 +08:00

@yujnln ok了，必须要带cookie，不带不让抓。

est

2012-07-16 16:44:38 +08:00

@bigdude 不需要cookie也可以的

curl -vL "http://brand.tmall.com/azIndexInside.htm?firstLetter=A&prt=1342414752421&prc=5"

bigdude

2012-07-16 17:32:05 +08:00

@est 了解了，强制让curl follow这个链接，用-L就行了，搞不懂淘宝为何搞这么多跳转

第 1 页／共 1 页

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/42507

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.