<table class="tabledataformat" cellspacing="0" >
<tr>
<td style="vertical-align:top;">Copper, Cu </td>
<td class="dataCell" style="vertical-align:top;"><= 0.03 %<span class="dataCondition"></span></td>
<td class="dataCell" style="vertical-align:top;"><= 0.03 %<span class="dataCondition"></span></td>
<td class="dataComment" style="vertical-align:top;"></td>
</tr>
</table>
response.xpath('//table[@class="tabledataformat"]/tr').extract() 只能获取到
<tr>
<td style="vertical-align:top;">Copper, Cu </td>
<td class="dataCell" style="vertical-align:top;"></td>
<td class="dataCell" style="vertical-align:top;"></td>
<td class="dataComment" style="vertical-align:top;"></td>
</tr>
<= 0.03 % 和 消失不见,为什么呢?
1
imn1 2017-03-04 16:37:21 +08:00
因为<=的写法不符合 xml 标准
|
2
leavic 2017-03-04 16:39:45 +08:00
这部分数据可能是 javascript 异步请求显示的,也就是 ajax 内容, scrapy 是看不到的。
|
3
dsg001 2017-03-04 19:24:35 +08:00
'''
<tr> <td style="vertical-align:top;">Copper, Cu </td> <td class="dataCell" style="vertical-align:top;"><= 0.03 %<span class="dataCondition"></span></td> <td class="dataCell" style="vertical-align:top;"><= 0.03 %<span class="dataCondition"></span></td> <td class="dataComment" style="vertical-align:top;"></td> </tr> ''' 测试 lxml 能输出, scrapy 应该也没问题,查看 html 源码吧 |
4
crazypig14 2017-03-07 10:56:49 +08:00
scrapy 爬下来用 beautifulsoup 处理,我觉得方便些
|