SAE 上抓取 https 资源出错,调了两天都没调出来,请大神帮忙看看 。

2015-11-03 12:52:27 +08:00
 leilux
调了两天都没调出来,发到这里请大神帮忙看看可能是什么问题?

**描述**:在 SAE 上使用 tornado.simple_httpclient.SimpleAsyncHTTPClient 来抓取 https 网页,本地测试是没问题的

**错误重现 URL**:[http://droprest.sinaapp.com/article?url=https%3A%2F%2Fpress.taobao.com%2Fdetail.html%3Fspm%3Da21bo.7724922.8439-0.1.K2HoLf%26postId%3D1723845&next=true ]( http://droprest.sinaapp.com/article?url=https%3A%2F%2Fpress.taobao.com%2Fdetail.html%3Fspm%3Da21bo.7724922.8439-0.1.K2HoLf%26postId%3D1723845&next=true)

**环境为 python2.7.9 , tornado 为 2.1.1**

**核心代码**

```
from tornado import httpclient
from tornado import httpserver
from tornado.ioloop import IOLoop
from tornado import web

class Application(web.Application):
def __init__(self, handlers=[], **kwargs):
handlers.extend([
(r"/article", Handler),
])

settings = dict({
'template_path': os.path.join(os.path.dirname(__file__), 'templates'),
"debug": False,
}, **kwargs)

super(Application, self).__init__(handlers, **settings)


class Handler(web.RequestHandler):
@web.asynchronous
def get(self):
self.url = self.get_argument('url', u'')

headers = {
'Accept-Encoding':'gzip',
'Accept-Language': 'zh-CN,zh;q=0.8',
"Accept-Charset": "UTF-8,*;q=0.5",
"User-Agent": "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17",
"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}
# asynchronously fetch web page
httpclient.AsyncHTTPClient(max_clients=20).fetch(
httpclient.HTTPRequest(
method='GET',
url=self.url,
headers=headers,
follow_redirects=True),
self.on_fetch,
)


def on_fetch(self, response):
response.rethrow()

content_type = response.headers.get('Content-Type')
if 'text/html' not in content_type and 'application/xhtml' not in content_type:
raise TypeError('not html or xhtml file')

html = response.body

# get content
content = {u'content': html, 'url': self.url}

self.finish()


if __name__ == '__main__':
from tornado.options import parse_command_line
parse_command_line()
application = Application(**{'debug':True})

logging.info('Server running on http://localhost:8080')
http_server = httpserver.HTTPServer(application)
http_server.listen(8080)
IOLoop.instance().start()

```

**详细信息:**

```
- [2015/10/29 18:52:12] - ERROR:root:Exception in I/O handler for fd 10
Traceback (most recent call last):
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/ioloop.py", line 309, in start
self._handlers[fd](fd, events)
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 270, in _handle_events
self._handle_write()
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 614, in _handle_write
self._do_ssl_handshake()
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 584, in _do_ssl_handshake
self.socket.do_handshake()
File "/usr/local/sae/python/lib/python2.7/ssl.py", line 788, in do_handshake
self._sslobj.do_handshake()
SSLError: socket write not completed (_ssl.c:562) yq34


- [2015/10/29 18:52:12] - ERROR:root:Uncaught exception, closing connection.
Traceback (most recent call last):
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 270, in _handle_events
self._handle_write()
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 614, in _handle_write
self._do_ssl_handshake()
File "/usr/local/sae/python/lib/python2.7/site-packages/tornado/iostream.py", line 584, in _do_ssl_handshake
self.socket.do_handshake()
File "/usr/local/sae/python/lib/python2.7/ssl.py", line 788, in do_handshake
self._sslobj.do_handshake()
SSLError: socket write not completed (_ssl.c:562) yq34
```
2238 次点击
所在节点    问与答
2 条回复
yinxingren
2015-11-03 13:00:53 +08:00
ip 被淘宝封了吧
leilux
2015-11-03 13:27:25 +08:00

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/233211

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX