一个关于 beautifulsoup 里面的 findAll 的使用问题 - V2EX

首页注册登录

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

这是一个创建于 3023 天前的主题，其中的信息可能已经有所发展或是发生改变。

我要解析的网页中同时存在

<p class="left title"><a href="xxxx">xxxx</a></p>

和

<p class="title"><a href="xxxx">xxxx</a></p>

如果用

soup.findAll('p', {'class': 'title'}):

会同时输出上面两个 class ，我现在只想要"title"，应该怎么写？

8 条回复 • 2016-09-11 11:24:17 +08:00

1

assassinleo

2016-09-10 12:45:24 +08:00

试试这个看行不： soup.find_all(‘ p ’,class="title")或者 soup.find_all(‘ p ’,class=re.compile("title"))

ref: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all

2

xucuncicero

OP

2016-09-10 15:37:49 +08:00

@assassinleo 无效，只是写法不一样，后一个有语法错误（应该写成 class_），还有其他写法结果也没什么区别。

这里的问题是 find_all 或者 findAll 的参数中只要能匹配到"title"就肯定能匹配到"left title"，不知道有没有什么能排除某个字符串的写法。

3

caspartse

2016-09-10 16:19:17 +08:00

1

soup.select('p[class=title]')

4

xiahei

2016-09-10 16:23:10 +08:00

1

不一定就要用`find_all()`, `select()`也能用上。

5

7sDream

2016-09-10 16:26:24 +08:00

1

http://7sdream-rikka-demo.daoapp.io/files/2016-09-10-919783703

如果看不见图片就复制打开……

我试了一下这样可以。

希望有帮助。

6

judyApple

2016-09-11 04:29:44 +08:00

1

加个 if 语句好像就可以。 itDic.attrs 返回一个字典，要字典的 value 长度为 1 就可以筛去 left
for itDic in soup.findAll("p",{"class":"title"}):
if len(itDic.attrs['class'])==1:
print(itDic.attrs)

7

aihimmel

2016-09-11 10:32:11 +08:00 via Android

Xpath 大法好

8

xucuncicero

OP

2016-09-11 11:24:17 +08:00

@caspartse
@xiahei
@7sDream 多谢，简单高效

@judyApple 同赞

@aihimmel 看来得多学点东西了

关于 · 帮助文档 · 博客 · API · FAQ · 实用小工具 · 2704 人在线 最高记录 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 26ms · UTC 14:37 · PVG 22:37 · LAX 06:37 · JFK 09:37
Developed with CodeLauncher
♥ Do have faith in what you're doing.