BeautifulSoup 官方文档中的例子,没看懂,

thekoc

2016-08-18 22:36:56 +08:00

你的整个程序是怎么样的？它的意思是说可以给 find_all 传一个函数作为参数，用这个函数来定义应该满足的条件。你传进去的函数是和用例种一模一样的吗？

emric

2016-08-18 22:48:42 +08:00

这个解析的结果是正确的。`Once upon a time there were...` 后处有省略号。

redhatping

2016-08-18 22:58:41 +08:00

@emric 为什么是 P 标签呢.? a 标签为什么没有考虑呢? a 有 id 属性啊.

kxxoling

2016-08-18 23:03:09 +08:00

晕，看了半天才反应过来。。。问题在于 bs tag 的打印方法上，你的结果和它的同样是一个长度为 3 的列表，只不过例子中用省略号代替了中间的标签，而你的输出中列表的第二个元素打印出来是 ``Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.``，不信你打印一下结果的长度看看。

redhatping

2016-08-18 23:11:15 +08:00

@kxxoling 恩, 谢谢, 但是不明白 ,为什么打印 P 标签,而没有 A 标签的判断呢?

kxxoling

2016-08-18 23:16:40 +08:00

@redhatping 没明白你的问题？能换个方法问一下吗。。。

cheneydog

2016-08-18 23:17:17 +08:00

and not tag.has_attr('id')

redhatping

2016-08-18 23:25:09 +08:00

@kxxoling 为什么没有过滤掉 <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

kxxoling

2016-08-18 23:29:05 +08:00

@redhatping 列表的第二项是一个 p 标签，而这 3 个 a 标签是 p 标签里面的内容，并没有独立出现在过滤结果中。

emric

2016-08-18 23:31:13 +08:00

@redhatping 因为这个 P 包括 A ， print 一下 `soup.find_all(True)` 你就懂了。

skydiver

2016-08-19 02:26:16 +08:00

文档里省略号了而已……

redhatping

2016-08-19 13:14:40 +08:00

@skydiver 不对的

skydiver

2016-08-19 13:20:01 +08:00

@redhatping 楼上已经好几个人跟你解释了，你还不理解就没办法了

redhatping

2016-08-19 13:33:32 +08:00

@skydiver 不是这回事 , 请看官方文档,.

amustart

2016-08-23 14:43:45 +08:00

return tag.has_attr('class') and not tag.has_attr('id')

返回有 class 属性但是没有 id 属性的 tag ， a 标签有 id 属性，所以 passpass 掉了

amustart

2016-08-23 15:15:46 +08:00

@amustart 无脑打了，发现不对，敲了一遍， find 了三个， a 标签是 p 的子标签， `has_class_but_no_id(tag)` 不会递归去看 p 标签的子标签，（这是你问为什么没有 A 标签的答案。）

下面我在每个找到的元素之间加个了几个换行以显示的更清晰

"""
The Dormouse's story

Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.

...

"""
官方文档确实省略了第二个 p 里的东西