scrapy 爬网站 用代理的时候 报错如下

2019-01-04 16:51:00 +08:00
 Ewig
2019-01-04 16:26:57 [csrc][scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.csrc.gov.cn/pub/newsite/xxpl/yxpl/index.html> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'wrong version number')]>]
1131 2019-01-04 16:27:04 [csrc][scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.csrc.gov.cn/pub/newsite/xxpl/yxpl/index_1.html> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'wrong version number')]>]
1132 2019-01-04 16:27:09 [csrc][scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.csrc.gov.cn/pub/newsite/xxpl/yxpl/index_2.html> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'wrong version number')]>]
1133 2019-01-04 16:27:16 [csrc][scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.csrc.gov.cn/pub/newsite/xxpl/yxpl/index_3.html> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'wrong version number')]>]
1134 2019-01-04 16:27:21 [csrc][scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.csrc.gov.cn/pub/newsite/xxpl/yxpl/index_4.html> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'wrong version number')]>]


查了国外网站,没找到原因是啥,其他网站没问题,就这个网站,报错 不知道为啥??

http://www.csrc.gov.cn/pub/newsite/xxpl/yxpl/index.html 这个网站
4680 次点击
所在节点    Python
6 条回复
houzhimeng
2019-01-04 16:56:20 +08:00
Ewig
2019-01-04 17:13:58 +08:00
@houzhimeng import base64


class proxy_middleware(object):

def __init__(self):
proxy_host = "w.t.16yn"
proxy_port = "***"
self.username = "***"
self.password = "**"
self.proxies = {"http": "http://{}:{}/".format(proxy_host, proxy_port)}
self.proxy_server = 'https://w5.t.16yun.cn:6469'
self.proxy_authorization = 'Basic ' + base64.urlsafe_b64encode(
bytes((self.username + ':' + self.password), 'ascii')).decode('utf8')

def process_request(self, request, spider):
request.meta['proxy'] = self.proxy_server
request.headers['Proxy-Authorization'] = self.proxy_authorization

我改成这样还是不行
15399905591
2019-01-04 17:28:04 +08:00
self.proxy_server = 'https://w5.t.16yun.cn:6469'
改成
self.proxy_server = 'http://w5.t.16yun.cn:6469'
huaerxiela
2019-01-04 17:39:44 +08:00
houzhimeng
2019-01-04 18:29:05 +08:00
https://github.com/scrapy/scrapy/issues/1855 你看看这个情况跟你一样么?
Ewig
2019-01-05 10:54:42 +08:00
@15399905591 为啥这个原因

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/523931

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX