比如 stackoverflow 的 这个 userpage https://stackoverflow.com/users/542251/liam
我要得到用户的自我介绍这个 class 里的内容
response.xpath("//div[@class='grid--cell mt16 s-prose profile-user--bio ']").get()
最后的结果(如下)
'<div class="grid--cell mt16 s-prose profile-user--bio ">\r\n<p>Father, Husband, Rock Climber and Developer with 20 years experience in the industry (in that order).</p>\n\n<hr>\n\n<p><strong>Any fool can write code that a computer can understand. Good programmers write code that humans can understand.</strong></p>\n\n<p><a href="https://en.wikiquote.org/wiki/Martin_Fowler" rel="nofollow noreferrer">Martin Fowler (2008)</a></p>\n\n<hr>\n\n<p>Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. <strong>We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil</strong>. Yet we should not pass up our opportunities in that critical 3%.</p>\n\n<p><a href="https://en.wikiquote.org/wiki/Donald_Knuth" rel="nofollow noreferrer">Donald Ervin Knuth (Professor Emeritus at Stanford University, and winner of the 1974 Turing Award)</a></p>\n\n<hr>\n </div>'
怎么前面和后面还是有这个<div class="grid--cell mt16 s-prose profile-user--bio ">
?
难道这个前缀不应该没了吗? 如果我在后面加了个 text()的话就只剩下 ( '\r\n' ) 了...
scrapy 里的 xpath 不能像 lxml 里的 html 一样直接取 text_content()吗?
这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。
V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。
V2EX is a community of developers, designers and creative people.