作者:我和你2602883283 | 来源:互联网 | 2023-05-27 09:39
来自re.compile()的Python文档:NoteThecompiledversionsofthemostrecentpatternspassedtore.match(),r
来自re.compile()
的Python文档:
Note The compiled versions of the most recent patterns passed to
re.match(), re.search() or re.compile() are cached, so programs that
use only a few regular expressions at a time needn’t worry about
compiling regular expressions.
但是,在我的测试中,这个断言似乎没有成功.在对重复使用相同模式的以下片段进行计时时,编译版本仍然比未编译版本(应该被缓存)快得多.
我在这里找不到能解释时差的东西吗?
import timeit
setup = """
import re
pattern = "p.a.t.t.e.r.n"
target = "p1a2t3t4e5r6n"
r = re.compile(pattern)
"""
print "compiled:", \
min(timeit.Timer("r.search(target)", setup).repeat(3, 5000000))
print "uncompiled:", \
min(timeit.Timer("re.search(pattern, target)", setup).repeat(3, 5000000))
结果:
compiled: 2.26673030059
uncompiled: 6.15612802627
解决方法:
这是re.search的(CPython)实现:
def search(pattern, string, flags=0):
"""Scan through string looking for a match to the pattern, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).search(string)
这里是re.compile:
def compile(pattern, flags=0):
"Compile a regular expression pattern, returning a pattern object."
return _compile(pattern, flags)
这取决于re._compile
:
def _compile(*key):
# internal: compile pattern
cachekey = (type(key[0]),) + key
p = _cache.get(cachekey) #_cache is a dict.
if p is not None:
return p
pattern, flags = key
if isinstance(pattern, _pattern_type):
if flags:
raise ValueError('Cannot process flags argument with a compiled pattern')
return pattern
if not sre_compile.isstring(pattern):
raise TypeError, "first argument must be string or compiled pattern"
try:
p = sre_compile.compile(pattern, flags)
except error, v:
raise error, v # invalid expression
if len(_cache) >= _MAXCACHE:
_cache.clear()
_cache[cachekey] = p
return p
所以你可以看到,只要正则表达式已经在字典中,所涉及的唯一额外工作是字典中的查找(包括创建一些临时元组,一些额外的函数调用……).
更新
在好的日子里(上面复制的代码),当缓存太大时,缓存曾经完全失效.这些天,缓存周期 – 首先删除最旧的项目.这个实现依赖于python词典的排序(这是python3.7之前的实现细节).在python3.6之前的Cpython中,这会从缓存中删除一个任意值(这可能比使整个缓存无效更好)