如题,从学校新闻接口抓到的数据,是在浏览器网络监听的 portalAjax.getNewsXml.dwr 这里看到的响应,
用 python request post 方法调用的返回文本是:
//#DWR-INSERT
//#DWR-REPLY
dwr.engine._remoteHandleCallback('0','0',"\n<list><pagecount>3641</pagecount><item>\n
\n<link></link>\n<description></description>\n<category></category>\n<pubdate>Thu, 17 Nov 2016 00:23:21 GMT</pubdate>\n<guid></guid>\n<dc:creator xmlns:dc="<a href=" http:="" <a="" href="<a href=" http:="" purl.org"="" rel="nofollow">http://purl.org" rel="nofollow">purl.org="" dc="" elements="" 1.1="" "="" rel="nofollow">http://purl.org/dc/elements/1.1/"></dc:creator>\n<dc:date xmlns:dc="<a href=" http:="" <a="" href="<a href=" http:="" purl.org"="" rel="nofollow">http://purl.org" rel="nofollow">
purl.org="" dc="" elements="" 1.1="" "="" rel="nofollow">
http://purl.org/dc/elements/1.1/">Thu, 17 Nov 2016 00:23:21 GMT</dc:date>\n<xwbh>147934233182128368</xwbh>\n<color>null</color>\n<spanpic>pic</spanpic>\n<lmmc></lmmc>\n<enclosure url="<a href=" http:="" <a="" href="<a href=" http:="" www.ynnu.edu.cn"="" rel="nofollow">http://www.ynnu.edu.cn" rel="nofollow">
www.ynnu.edu.cn="" UserFiles="" Image="" 147934226721288544.png"="" rel="nofollow">
http://www.ynnu.edu.cn/UserFiles/Image/147934226721288544.png" type="image/pjpeg"/>\n</item></list>");
请问如何使用 py 截取里面的 xml ,我试着用字符串寻找到 xml 头部和尾部,然后调用 xml.etree 分析,但初始化 xml 时报错:
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 784
如何解决?