Python 的正则表达式的两种写法的区别是什么？

第一种写法

import re
line = "Cats are smarter than dogs"
m = re.match( r'(.*) are (.*?) .*', line)
print m.group()
print m.group(1)
print m.group(2)

第二种写法

import re
pattern = re.compile(r'(.*) are (.*?) .*')
m = pattern.match("Cats are smarter than dogs")
print m.group()
print m.group(1)
print m.group(2)

这两种写法有些啥区别？为啥要定义两种写法呢？

wwqgtxx

2018-08-25 11:45:48 +08:00

def match(pattern, string, flags=0):
"""Try to apply the pattern at the start of the string, returning
a Match object, or None if no match was found."""
return _compile(pattern, flags).match(string)

这是标准库的定义，所以说你的两种写法本质上是一样的

wocanmei

2018-08-25 13:28:42 +08:00

第一种每次调用 match 都有一个正则的编译时间，编译是指正则相当于一种简单的语言，需要对其进行解析，形成某种结构比如语法树，才好对字符串进行匹配，第二种是提前对正则进行了编译，而不是每次调用都有，效率比前者高点

wwqgtxx

2018-08-25 14:40:11 +08:00

@glacer
@wocanmei
其实 python3 的 re.py 中_compile()函数内部是有个_cache 的
https://github.com/python/cpython/blob/3.7/Lib/re.py#L268
所以并不会每次调用都会编译一遍

chenxytw

2018-08-25 16:13:58 +08:00

@wwqgtxx 其实 python2 就有了，只是那时候是简单的计数淘汰，当达到计数上限时，就把所有的编译过的 re 都扔了 0 0

wizardoz

2018-08-26 22:37:56 +08:00

就这个例子来说没啥区别，因为两种都是编译一次用了一次。但是如果同一个正则式反复用的话，可以调用一个 compile，然后反复用 pattern.match 可以减少多次 compile 的时间。

frostming

2018-08-27 11:49:49 +08:00

第一种，每次你 match 的时候都要执行一遍 pattern=re.compile(r'(.*) are (.*?) .*')
第二种，你先 compile 好了以后就不用每次都 compile 了，效率更高一点

你只用一次这个正则没什么区别，多次使用时有区别

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/483102

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.