抓取知网遇到的一个 bug,改了很多次的 xpath 匹配方法了,还是报错,求支援

2018-06-07 09:37:42 +08:00
 wei6666
Traceback (most recent call last):
File "China_hownet_journal_end.py", line 296, in <module>
china_hownet.run()
File "China_hownet_journal_end.py", line 281, in run
url_list = self.parse_content_html(html3str)
File "China_hownet_journal_end.py", line 212, in parse_content_html
html = etree.HTML(html3str)
File "lxml.etree.pyx", line 2945, in lxml.etree.HTML (src/lxml/lxml.etree.c:62546)
File "parser.pxi", line 1617, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:93194)
File "parser.pxi", line 1488, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:91938)
File "parser.pxi", line 969, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:88328)
File "parser.pxi", line 577, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:84385)
File "parser.pxi", line 676, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:85488)
File "parser.pxi", line 625, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:84945)
lxml.etree.XMLSyntaxError: line 1046: htmlParseEntityRef: expecting ';'
885 次点击
所在节点    问与答
0 条回复

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/461088

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX