Python scrapy 爬虫问题

2019-04-09 23:35:05 +08:00
 idotfish
用 scrapy 框架爬智联的招聘信息的时候报的错看不懂啊
2019-04-09 23:29:10 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2019-04-09 23:29:10 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:50132/session/b97f6963939467e28aa83493fcf91f9d/url {"url": "https://zhaopin.com", "sessionId": "b97f6963939467e28aa83493fcf91f9d"}
[7964:9720:0409/232912.471:ERROR:ssl_client_socket_impl.cc(964)] handshake failed; returned -1, SSL error code 1, net_error -100
[7964:9720:0409/232912.505:ERROR:ssl_client_socket_impl.cc(964)] handshake failed; returned -1, SSL error code 1, net_error -100
[7964:10376:0409/232913.146:ERROR:platform_sensor_reader_win.cc(242)] NOT IMPLEMENTED
2019-04-09 23:29:14 [urllib3.connectionpool] DEBUG: http://127.0.0.1:50132 "POST /session/b97f6963939467e28aa83493fcf91f9d/url HTTP/1.1" 200 72
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:50132/session/b97f6963939467e28aa83493fcf91f9d/window_handle {"sessionId": "b97f6963939467e28aa83493fcf91f9d"}
2019-04-09 23:29:14 [urllib3.connectionpool] DEBUG: http://127.0.0.1:50132 "GET /session/b97f6963939467e28aa83493fcf91f9d/window_handle HTTP/1.1" 200 111
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:50132/session/b97f6963939467e28aa83493fcf91f9d/element {"using": "class name", "value": "zp-search__input", "sessionId": "b97f6963939467e28aa83493fcf9
1f9d"}
2019-04-09 23:29:14 [urllib3.connectionpool] DEBUG: http://127.0.0.1:50132 "POST /session/b97f6963939467e28aa83493fcf91f9d/element HTTP/1.1" 200 102
2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request



这是代码
class JobsSpider(scrapy.Spider):
name = 'jobs'
allowed_domains = ['zhaopin.com']
start_urls = ['https://www.zhaopin.com/']

def start_requests(self):
browser = webdriver.Chrome()
browser.get("https://zhaopin.com")
windows = browser.current_window_handle
input = browser.find_element_by_class_name('zp-search__input')
input.send_keys('Python')
time.sleep(1)
button = browser.find_element_by_class_name('zp-search__btn')
button.click()
all_handles = browser.window_handles
for handle in all_handles:
if handle != windows:
browser.switch_to.window(handle)
url = browser.current_url
yield Request(url,callback = self.parse)

def parse(self, response):
le = LinkExtractor(restrict_css='div.contentpile__content__wrapper__item.clearfix')
for link in le.extract_links(response):
yield scrapy.Request(link.url,callback=self.parse_job)

def parse_job(self,response):
jobs = JobItem()
sel = response.css('div.main')
jobs['jobname'] = sel.css('hi.l.info-h3::text').extract_first()
jobs['Cname'] = sel.css('div.company 1::text').extract_first()
jobs['salary'] = sel.css('div.l.info-money strong::text').extract_first()
jobs['joblocation'] = sel.css('span.icon-address::text').extract_first()
jobs['experience'] = sel.css('div.info-three.1').xpath('(.//span)[1].text()').extract_first()
jobs['education'] =sel.css('div.info-three.1').xpath('(.//span)[2].text()').extract_first()
jobs['count'] =sel.css('div.info-three.1').xpath('(.//span)[3].text()').extract_first()
jobs['jobintro'] = sel.css('div.pos-ul').extract
yield jobs

这是不是和 cookie 有什么关系啊 求各位大佬解答
1524 次点击
所在节点    Python
3 条回复
huisezhiyin
2019-04-10 15:13:00 +08:00
你这个代码格式贴的 让人很难看得懂啊
idotfish
2019-04-10 15:46:10 +08:00
@huisezhiyin 不好意思,刚刚入门 python,不太懂这些东西,把代码直接截图出来可以吗
huisezhiyin
2019-04-10 16:17:04 +08:00
@idotfish 你这随便搜一下 ERROR 就有答案啊
随便搜一下 error:ssl_client_socket_impl.cc(964)] handshake failed
stack overflow 上的一个答案
https://stackoverflow.com/questions/37883759/errorssl-client-socket-openssl-cc1158-handshake-failed-with-chromedriver-chr
不行的话就试试其他的答案

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/553538

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX