html = etree.parse(htmlStr,etree.HTMLParser()) # htmlStr 来自完整的整个 html 文件内容,这一步正常
result = html.xpath('//*[@div="info"]')
tmpStr = ''
for st in result:
divSetion = (etree.tostring(st,encoding="unicode", pretty_print=True, method="html"))
if (xxxxxxx) in divSetion:
tmpStr = divSetion #成功获得代码段
else:
exit(0)
#此时 tmpStr 肯定是有内容的,条件满足的话,打算对这一代码段进行 xpath 定位选择
#html = etree.parse(tmpStr,etree.HTMLParser() )
html = etree.parse(tmpStr) #这一步不行了
result = html.xpath('//*[@class="homeinfo"]')
for st in result: #测试输出有无内容
print(st)
PcCharm 报错内容输出节选:
Traceback (most recent call last):
File "D:/Mycode/tedital.py", line 55, in <module>
html = etree.parse(MatchDetailed)
File "src\lxml\etree.pyx", line 3435, in lxml.etree.parse
File "src\lxml\parser.pxi", line 1840, in lxml.etree._parseDocument
File "src\lxml\parser.pxi", line 1866, in lxml.etree._parseDocumentFromURL
File "src\lxml\parser.pxi", line 1770, in lxml.etree._parseDocFromFile
File "src\lxml\parser.pxi", line 1163, in lxml.etree._BaseParser._parseDocFromFile
File "src\lxml\parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
File "src\lxml\parser.pxi", line 711, in lxml.etree._handleParseResult
File "src\lxml\parser.pxi", line 638, in lxml.etree._raiseParseError
OSError: Error reading file '
这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。
V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。
V2EX is a community of developers, designers and creative people.