I am working on a simple token replacement feature of our product. I have almost resolved all the issue but I missed one thing. A token must support attributes, and an attribute can also be a token. This is part of a bigger project. hope you can help.


The begining tag is "**#[**" and the ending tag is "**]**". Say, #[FirstName], #[LastName], #[Age, WhenZero="Undisclosed"].

开头的标签是“**#[**”,结尾标签是“**] **”。比如,#[FirstName],#[LastName],#[Age,WhenZero =“Undisclosed”]。

Right now i am using this expression "\#\[[^\]]+\]". I have this working but it failed on this input:

现在我正在使用这个表达式“\#\ [[^ \]] + \]”。我有这个工作,但它输入失败:

blah blah text here...
**#[IsFreeShipping, WhenTrue="
blah blah text here also...

It fails becauise it encouter the first ], it stops there. It returns:


*#[IsFreeShipping, WhenTrue="

My desired result should be


*#[IsFreeShipping, WhenTrue="

7 个解决方案


Your Regex matches exactly what your stated condition indicates : Start with an opening square bracket and match everything upto the first closing square bracket.


If you want to match nested square brackets, you need to specify exactly what is valid when nested. For instance, you could say that square brackets can be nested when enclosed within quotes.



This is a little border-line for a regexp, since it depends on a context, but still...



should do it.


The idea is to mention a closing square bracket can be part of the parsed content if followed by a double quotes, as part of the end of an attribute.


If that same square bracket were anywhere within the attribute, that would be a lot harder...


The advantage with lookahead expression is that you can specify a regexp with a non-fixed match length.
So if the attribute closing square bracket is not followed by a double quote, but rather by another known expression, you just update the lookahead part:



will match only the second closing square bracket, since the first is followed by ".


Of course, any kind of greedy expression (.*]) would not work, since it would not match the second closing square bracket, but the last one. (Meaning if there are more the one intermediate ], it will be parsed.)

当然,任何一种贪婪的表达式(。*]都不会起作用,因为它与第二个结束方括号不匹配,而是最后一个。 (意思是如果有更多的中间],它将被解析。)


When I've done stuff like this before I've evaluated from the inner most matchable expression before stepping out to larger strings.


In this case your regex should probably try to replace $[FreeShipping] with it's value before evaluating the larger token containing the if clause.

在这种情况下,你的正则表达式可能会在评估包含if子句的较大标记之前尝试用它的值替换$ [FreeShipping]。

Perhaps you can figure out a way to replace out the value token's like $[FreeShipping] before the ones without $ prepending the token

也许你可以找到一种方法来替换价值代币,就像$ [FreeShipping]之前没有$前置令牌

This is roughly but not exactly


http://en.wikipedia.org/wiki/Multi-pass_compiler versus http://en.wikipedia.org/wiki/One-pass_compiler


Writing this in one regex won't necessarily be any faster than looping over a few simple regex's. All regex's do is abstract string parsing.



If you're only expecting a single match in any given input you could simply allow for a greedy match:



If you're expecting multiples you have a problem because you no longer have regular text. You'll need to escape the inner brackets in some way.


(Regex is a deep subject - it's quite possible that someone has a better solution)

(正则表达式是一个深刻的主题 - 很可能有人有更好的解决方案)


I'd be interested to lear if I'm wrong, but if I recall correctly, you cannot do this using regular expressions. This looks like a Dyck language to me and you would need a pushdown automaton to accept the expressions. But I must admit I'm not quite sure if this holds true for the extended form of regexp's like those provided by Perl.



It is possible to write a regex for the example you given but in general it fails. A single regex can't work for arbitrary nested expressions.


Your example shows that your DSL has 'if' conditions already. Not before long It could evolve into a Turing-complete language.


Why don't you use an existing template language such as Django template language:


Your example:

blah blah text here... #[IsFreeShipping, 
blah blah text here also...

Using Django template language:


blah blah text here... {% if IsFreeShipping %}

{{ FreeShipping }}
{% endif %} blah blah text here also...


This works for your sample:



It assumes that the inner square brackets can't themselves contain square brackets. If the inner tokens can also contain tokens, you're probably out of luck. Some regex flavors can handle recursive structures, but the resulting regexes are hideous even by regex standards. :D

它假定内方括号本身不能包含方括号。如果内部令牌也可以包含令牌,那么你可能运气不好。一些正则表达式的风格可以处理递归结构,但即使通过正则表达式标准,生成的正则表达式仍然是可怕的。 :d

Tis regex also treats the '$' as special only if it's followed by an opening square bracket. If you want to disallow its use otherwise, remove the second alternative: |\$(?!\[)

Tis正则表达式只会将'$'视为特殊的,只要它后面是一个开头的方括号。如果您不想禁用它,请删除第二个选项:| \ $(?!\ [)

