如何用scrapy抓取数据时使用Xpath获得全部文本

2013-08-30 22:37:30 +08:00
 brucebot
使用的是
item['description'] = app.select('./li//text()').extract_unquoted()
得到
{"description": ["\n I visited the local ", "robotic", " club and recorded the preparation for the up-coming competition. I was invited to make a speech there\u00a0", "...", "\n "]

请教应该怎么写,才能得到
{"description": ["I visited the local robotic club and recorded the preparation for the up-coming competition. I was invited to make a speech there...“]
这样的形式,就是只需要保留文本的部分?
6953 次点击
所在节点    Python
4 条回复
beordle
2013-08-30 22:38:48 +08:00
"".join(app.select('./li//text()').extract_unquoted())
brucebot
2013-08-30 22:45:24 +08:00
@beordle 这样出来的结果是
{"description": "\n I visited the local robotic club and recorded the preparation for the up-coming competition. I was invited to make a speech there...\n“
[]不见了?
beordle
2013-08-30 22:56:25 +08:00
加上就好了嘛~
item['description'] = ["".join(app.select('./li//text()').extract_unquoted())]
如果需要/n换行的话
item['description'] = "".join(app.select('./li//text()').extract_unquoted()).split('\n')
可能还有一些细节问题。hack下即可
brucebot
2013-08-30 23:06:38 +08:00
@beordle 非常感谢

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/80815

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX