Github 上代理池项目 IPProxyPool 运行时出现的一个错误

2017-03-29 22:27:17 +08:00
 agua199408

我在用Github 一个开源代理池项目 运行的过程中,在某些情况下会出现以下错误:

Traceback (most recent call last):
  File "E:\proxy\IPProxyPool\spider\HtmlDownloader.py", line 18, in download
    r = requests.get(url=url, headers=config.get_header(), timeout=config.TIMEOUT)
  File "D:\anaconda\lib\site-packages\requests\api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "D:\anaconda\lib\site-packages\requests\api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "D:\anaconda\lib\site-packages\requests\sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "D:\anaconda\lib\site-packages\requests\sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "D:\anaconda\lib\site-packages\requests\adapters.py", line 479, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='proxy-list.org', port=443): Max retrie
s exceeded with url: /english/index.php?p=9 (Caused by ConnectTimeoutError(<requests.packages.urllib
3.connection.VerifiedHTTPSConnection object at 0x0000000006B95390>, 'Connection to proxy-list.org ti
med out. (connect timeout=5)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 1139, in _execute_context
    context)
  File "D:\anaconda\lib\site-packages\sqlalchemy\engine\default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\anaconda\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
  File "D:\anaconda\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "E:\proxy\IPProxyPool\spider\ProxyCrawl.py", line 26, in startProxyCrawl
    crawl.run()
  File "E:\proxy\IPProxyPool\spider\ProxyCrawl.py", line 56, in run
    self.crawl_pool.map(self.crawl, parserList)
  File "D:\anaconda\lib\site-packages\gevent\pool.py", line 308, in map
    return list(self.imap(func, iterable))
  File "D:\anaconda\lib\site-packages\gevent\pool.py", line 102, in next
    raise value.exc
  File "D:\anaconda\lib\site-packages\gevent\greenlet.py", line 534, in run
    result = self._run(*self.args, **self.kwargs)
  File "E:\proxy\IPProxyPool\spider\ProxyCrawl.py", line 67, in crawl
    response = Html_Downloader.download(url)
  File "E:\proxy\IPProxyPool\spider\HtmlDownloader.py", line 27, in download
    proxylist = sqlhelper.select(10)
  File "E:\proxy\IPProxyPool\db\SqlHelper.py", line 127, in select
    return query.order_by(Proxy.score.desc(), Proxy.speed).limit(count).all()
  File "D:\anaconda\lib\site-packages\sqlalchemy\orm\query.py", line 2613, in all
    return list(self)
  File "D:\anaconda\lib\site-packages\sqlalchemy\orm\query.py", line 2761, in __iter__
    return self._execute_and_instances(context)
  File "D:\anaconda\lib\site-packages\sqlalchemy\orm\query.py", line 2776, in _execute_and_instances

    result = conn.execute(querycontext.statement, self._params)
  File "D:\anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "D:\anaconda\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connecti
on
    return connection._execute_clauseelement(self, multiparams, params)
  File "D:\anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseeleme
nt
    compiled_sql, distilled_params
  File "D:\anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context
    context)
  File "D:\anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 1341, in _handle_dbapi_except
ion
    exc_info
  File "D:\anaconda\lib\site-packages\sqlalchemy\util\compat.py", line 202, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "D:\anaconda\lib\site-packages\sqlalchemy\util\compat.py", line 185, in reraise
    raise value.with_traceback(tb)
  File "D:\anaconda\lib\site-packages\sqlalchemy\engine\base.py", line 1139, in _execute_context
    context)
  File "D:\anaconda\lib\site-packages\sqlalchemy\engine\default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked [SQL: 'SELECT proxys.
ip AS proxys_ip, proxys.port AS proxys_port, proxys.score AS proxys_score \nFROM proxys ORDER BY pro
xys.score DESC, proxys.speed\n LIMIT ? OFFSET ?'] [parameters: (10, 0)]

在 stackoverflow 看到以下解答:

SQLite locks the database when a write is made to it, such as when an UPDATE, INSERT or DELETE is sent. When using the ORM, these get sent on flush. The database will remain locked until there is a COMMIT or ROLLBACK.

结合上面的报错,我估计错误是这样产生的: 程序正在抓取过程中,爬到的代理都 send 到了 flush ,此时连接超时,程序来到HtmlDownloader.py文件的 27 行又开始调用数据库命令,从而引发了错误。

然而我技术很菜,研究了很久也不知道怎么改好,希望大神帮忙分析分析,谢谢!

940 次点击
所在节点    数据库
2 条回复
xrlin
2017-03-29 23:27:43 +08:00
反映给作者吧,作者更新效率挺高的。
agua199408
2017-03-30 00:13:03 +08:00
@xrlin 嗯,已经提交了 issue

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/351321

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX