scrapy 数据输出成 json 文件是空的，这是什么问题

2015-05-11 23:00:14 +08:00

sugusor

小弟刚在学scrapy。参考了官方文档照着它写了一段抓取代码。可以抓取到内容，但在使用scrapy crawl dmoz -o items.json 命令将爬的东西保存成json时生成的文件却只有一些空的［］，这是什么原因，求大牛help!

import scrapy
from scrapy.selector import Selector

from tutorial.items import DmozItem

class DmozSpider(scrapy.spider.Spider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    start_urls = [
    "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
    "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
    ]

    def parse(self, response):
    sel = Selector(response)
    sites = sel.xpath('//ul[@class="directory-url"]/li')
    items = []
    for sel in sites:
        item = DmozItem()
        item['title'] = sel.xpath('a/text()').extract()
        item['link'] = sel.xpath('a/@href').extract()
        item['desc'] = sel.xpath('text()').re('-\s[^\n]*\\r')
        items.append(item)
    return items

8034 次点击

所在节点

Python

26 条回复

sunchen

2015-05-12 22:03:51 +08:00

@sugusor 把parse方法改成yield单条DmozItem，如果你想一个parse方法中输出多条item，直接多次yield 就行了，然后把你的pipeline代码发上来

sugusor

2015-05-12 23:12:14 +08:00

@sunchen 我的pipeline代码是这样的
# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

class TutorialPipeline(object):
def process_item(self, item, spider):
return item

sugusor

2015-05-12 23:13:13 +08:00

@beibeijia 唔，这个例子我已经试过了，好像还是不行＝＝，我还是重装好了。。不过能说一下怎么能彻底卸载吗？

sunchen

2015-05-12 23:51:23 +08:00

@sugusor pipeline里 return改成yield试试

sunchen

2015-05-12 23:54:32 +08:00

@sugusor 不对，pipeline里就应该是return，请忽略我的上条回复

beibeijia

2015-05-13 18:21:04 +08:00

@sugusor pip uninstall scrapy 然后再安装一遍我刚装的时候也遇到测试问题这样重新安装了一遍就ok了如果不行那就得把依赖包全给卸了重来唉遇到这种奇葩问题反正就是折腾你自己试试吧。

第 2 页／共 2 页

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/190294

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.