热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

perl中的regex比Java或其他语言中的regex快吗?-IsregexinperlfasterthaninJavaorotherlanguages?

Ihaveheardfromtimetotimefrompeople,whosaidthatregexinPerlisfasterthaninotherlang

I have heard from time to time from people, who said that regex in Perl is faster than in other languages. Also, some online documents also say Perl has advantages when it comes to regex processing. Can you guys explain if this is true, and why?

我经常听到人们说,Perl中的regex比其他语言中的regex快。此外,一些在线文档还说,Perl在regex处理方面具有优势。你们能解释一下这是真的吗?为什么?

3 个解决方案

#1


17  

Why would you consider the speed of two engines when one of them (Java's) is notably buggy? (Search for writings by Tom "tchrist" Christiansen on the subject.) For example, \s fails to match many space characters.

为什么你要考虑两个引擎的速度,当其中一个(Java的)有明显的错误?(寻找关于这个主题的汤姆·克里斯汀森的作品。)例如,\s不能匹配许多空格字符。

Also, some online documents also say Perl has advantages when it comes to regex processing.

此外,一些在线文档还说,Perl在regex处理方面具有优势。

Here are some:

这里有一些:

  • Has many features you cannot find in other engines, either because the other engines haven't copied them yet, or because their design does not permit them to support those features.
  • 有许多你在其他引擎中找不到的特性,或者因为其他引擎还没有复制它们,或者因为它们的设计不允许它们支持这些特性。
  • Highly optimised. Many of these optimisations help to report failed matches sooner, something not covered by many benchmarks.
  • 高度优化。这些优化中的许多可以帮助您更早地报告失败的匹配,这是许多基准所不能涵盖的。
  • A leader in Unicode support. It's support is so advanced that our developers are finding problems with the Unicode standard itself and have worked to have them resolved!
  • Unicode支持方面的领导者。它的支持如此先进,以至于我们的开发人员发现Unicode标准本身存在问题,并努力解决它们!
  • Remarkly bug free.
  • 备注错误。

#2


13  

You may have a look at this benchmark. In the table, column patmch:1t gives the time on matching URL with /([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/?[^ ]*)/, while column patmch:2t on matching URL or email with /([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/?[^ ]*)|([^ @]+)@([^ @]+)/ (note the | operator). For the first pattern, Perl is about 10X faster than Java; for the second, they are about the same.

您可以看看这个基准。在表中,列patmch:1 t给的时间匹配的URL /([a-zA-Z][a-zA-Z0-9]*):/ /([^ /]+)(/ ?[^]*)/,而列patmch:2 t匹配的URL上和/或电子邮件([a-zA-Z][a-zA-Z0-9]*):/ /([^ /]+)(/ ?[^]*)|([^ @]+)@([^ @]+)/(注意|操作符)。对于第一个模式,Perl比Java快大约10倍;第二,它们是一样的。

In general, Perl uses a backtrack regex engine. Such an engine is flexible, easy to implement and very fast on a subset of regex. However, for other types of regex, for example when there is the | operator, it may become very slow. In the extreme case, its match speed is exponential in the length of the pattern. Another type of regex engine is based on NFA. It is harder to implement but has stable performance (quadratic at the worst IIRC) for all types of input. Russ Cox has several articles about these topics, which I like a lot.

通常,Perl使用回溯regex引擎。这样的引擎是灵活的,易于实现,并且非常快速地在regex的一个子集。但是,对于其他类型的regex,例如有|操作符时,它可能会变得非常慢。在极端情况下,它的匹配速度是模式长度的指数。另一种regex引擎基于NFA。它很难实现,但对所有类型的输入都具有稳定的性能(在最坏的情况下是二次递归)。Russ Cox有几篇关于这些话题的文章,我非常喜欢。

I don't know what types of regex engine Java is using, but from the benchmark, its performance does not seem impressive. You may also be interested in this benchmark which evaluates several C/C++ libraries on regex.

我不知道regex引擎Java使用的是什么类型,但是从基准测试来看,它的性能并不令人印象深刻。您可能还对这个基准感兴趣,它在regex上评估几个C/ c++库。

EDIT: In both benchmarks, patterns are tested against an old version of Linux Howto. The vast majority of lines do not have a match.

编辑:在这两个基准测试中,模式都是针对旧版本的Linux Howto进行测试的。绝大多数的线路都没有匹配。

About DFA vs. NFA: if I am right, a pure DFA cannot capture groups, at least not easily. Only NFA can capture groups. I heard that RE2 transform local NFA to DFA for the part of regex without group captures. I do not know if this is true.

关于DFA和NFA:如果我是正确的,一个纯粹的DFA不能捕捉组,至少不容易。只有NFA可以捕捉组。我听说RE2将本地NFA转换为DFA,作为regex的一部分,没有组捕获。我不知道这是不是真的。

On PCRE: PCRE has the same problem as Perl - inefficient given complex alternations. You may have a look at the regex-dna benchmark from the Computer Language Benchmarks Game. Versions using PCRE are all much slower than the fastest version that is using TCL (maybe PCRE is not using trie?). V8 is clearly the winner in that benchmark because it does not use backtrack. IMO, for C++ programmers, the best regex library is RE2.

在PCRE上:在复杂的交互中,PCRE与Perl同样存在效率低下的问题。您可以查看计算机语言基准游戏中的regex-dna基准。使用PCRE的版本比使用TCL的最快版本慢得多(也许PCRE没有使用trie?)V8显然是这个基准测试的赢家,因为它不使用回溯。在我看来,对于c++程序员来说,最好的regex库是RE2。

#3


12  

The point is not that Perl is or isn't faster than Java (benchmarks tests will tell you), but that regexes is really (deeply) part of the language itself. Just an example, in Perl, no need to load any module to use regex. See this relevant answer

重点不是Perl比Java快或慢(基准测试会告诉您),而是regexes实际上(非常)是语言本身的一部分。只是一个例子,在Perl中,不需要加载任何模块来使用regex。看到这个相关的答案

Ex. a Perl one-liner in a pseudo-terminal (that prints the root's shell) :

伪终端中的Perl一行程序(打印根的外壳):

perl -nE '/^root.*:([\/\w]+)$/ and say $1' /etc/passwd

How many lines do you need to do the same thing in Java ?

在Java中做同样的事情需要多少行?

Perl is de-facto the reference language for regexes. That's why so many language use PCRE engine (that means Perl Compatible Regular Expression)

Perl实际上是regexes的引用语言。这就是为什么许多语言都使用PCRE引擎(这意味着Perl兼容正则表达式)


推荐阅读
author-avatar
wang静的天空
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有