热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

正则表达式:如何获取标签内的所有内容#[SOMETEXTHERE]-Regex:Howtogetallcontentsinsideatag#[SOMETEXTHERE]

Iamworkingonasimpletokenreplacementfeatureofourproduct.Ihavealmostresolvedalltheis

I am working on a simple token replacement feature of our product. I have almost resolved all the issue but I missed one thing. A token must support attributes, and an attribute can also be a token. This is part of a bigger project. hope you can help.

我正在研究我们产品的简单令牌替换功能。我几乎解决了所有问题,但我错过了一件事。令牌必须支持属性,属性也可以是令牌。这是一个更大项目的一部分。希望你能帮忙。

The begining tag is "**#[**" and the ending tag is "**]**". Say, #[FirstName], #[LastName], #[Age, WhenZero="Undisclosed"].

开头的标签是“**#[**”,结尾标签是“**] **”。比如,#[FirstName],#[LastName],#[Age,WhenZero =“Undisclosed”]。

Right now i am using this expression "\#\[[^\]]+\]". I have this working but it failed on this input:

现在我正在使用这个表达式“\#\ [[^ \]] + \]”。我有这个工作,但它输入失败:

blah blah text here...
**#[IsFreeShipping, WhenTrue="
$[FreeShipping]"]**
blah blah text here also...

It fails becauise it encouter the first ], it stops there. It returns:

它失败了因为它包围了第一个],它停在那里。它返回:

*#[IsFreeShipping, WhenTrue="
$[Product_FreeShipping]*

My desired result should be

我想要的结果应该是

*#[IsFreeShipping, WhenTrue="
$[FreeShipping]"]*

7 个解决方案

#1


Your Regex matches exactly what your stated condition indicates : Start with an opening square bracket and match everything upto the first closing square bracket.

您的正则表达式与您所声明的条件完全匹配:从一个开口的方括号开始,将所有内容与第一个结束方括号相匹配。

If you want to match nested square brackets, you need to specify exactly what is valid when nested. For instance, you could say that square brackets can be nested when enclosed within quotes.

如果要匹配嵌套的方括号,则需要准确指定嵌套时的有效内容。例如,您可以说方括号可以嵌入引号内嵌套。

#2


This is a little border-line for a regexp, since it depends on a context, but still...

这是正则表达式的一个小边界线,因为它取决于上下文,但仍然......

#\[(\](?=")|[^\]])+\]

should do it.

应该这样做。

The idea is to mention a closing square bracket can be part of the parsed content if followed by a double quotes, as part of the end of an attribute.

这个想法是提到一个结束方括号可以是解析内容的一部分,如果后跟双引号,作为属性结尾的一部分。

If that same square bracket were anywhere within the attribute, that would be a lot harder...

如果同一个方括号在属性中的任何位置,那将会更加困难......


The advantage with lookahead expression is that you can specify a regexp with a non-fixed match length.
So if the attribute closing square bracket is not followed by a double quote, but rather by another known expression, you just update the lookahead part:

前瞻表达式的优点是您可以指定具有非固定匹配长度的正则表达式。因此,如果关闭方括号的属性后面没有双引号,而是另一个已知表达式,则只需更新前瞻部分:

#\[(\](?=")|[^\]])+\]

will match only the second closing square bracket, since the first is followed by ".

将仅匹配第二个结束方括号,因为第一个后跟“。

Of course, any kind of greedy expression (.*]) would not work, since it would not match the second closing square bracket, but the last one. (Meaning if there are more the one intermediate ], it will be parsed.)

当然,任何一种贪婪的表达式(。*]都不会起作用,因为它与第二个结束方括号不匹配,而是最后一个。 (意思是如果有更多的中间],它将被解析。)

#3


When I've done stuff like this before I've evaluated from the inner most matchable expression before stepping out to larger strings.

在我从最内部的最匹配表达式进行评估之前,我已经完成了这样的事情,然后才逐步转向更大的字符串。

In this case your regex should probably try to replace $[FreeShipping] with it's value before evaluating the larger token containing the if clause.

在这种情况下,你的正则表达式可能会在评估包含if子句的较大标记之前尝试用它的值替换$ [FreeShipping]。

Perhaps you can figure out a way to replace out the value token's like $[FreeShipping] before the ones without $ prepending the token

也许你可以找到一种方法来替换价值代币,就像$ [FreeShipping]之前没有$前置令牌

This is roughly but not exactly

这大致但不完全正确

http://en.wikipedia.org/wiki/Multi-pass_compiler versus http://en.wikipedia.org/wiki/One-pass_compiler

http://en.wikipedia.org/wiki/Multi-pass_compiler与http://en.wikipedia.org/wiki/One-pass_compiler

Writing this in one regex won't necessarily be any faster than looping over a few simple regex's. All regex's do is abstract string parsing.

在一个正则表达式中写这个不一定比循环几个简单的正则表达式更快。所有正则表达式都是抽象字符串解析。

#4


If you're only expecting a single match in any given input you could simply allow for a greedy match:

如果您只想在任何给定输入中进行单个匹配,则可以简单地允许贪婪匹配:

/#\[.*\]/

If you're expecting multiples you have a problem because you no longer have regular text. You'll need to escape the inner brackets in some way.

如果您期望倍数,则会出现问题,因为您不再有常规文本。你需要以某种方式逃避内部支架。

(Regex is a deep subject - it's quite possible that someone has a better solution)

(正则表达式是一个深刻的主题 - 很可能有人有更好的解决方案)

#5


I'd be interested to lear if I'm wrong, but if I recall correctly, you cannot do this using regular expressions. This looks like a Dyck language to me and you would need a pushdown automaton to accept the expressions. But I must admit I'm not quite sure if this holds true for the extended form of regexp's like those provided by Perl.

我有兴趣了解我是否错了,但如果我没记错,你不能用正则表达式做到这一点。对我来说这看起来像Dyck语言,你需要一个下推自动机来接受表达式。但我必须承认,我不太确定这是否适用于像Perl一样的扩展形式的正则表达式。

#6


It is possible to write a regex for the example you given but in general it fails. A single regex can't work for arbitrary nested expressions.

可以为您给出的示例编写正则表达式,但一般情况下它会失败。单个正则表达式不适用于任意嵌套表达式。

Your example shows that your DSL has 'if' conditions already. Not before long It could evolve into a Turing-complete language.

您的示例显示您的DSL已经具有“if”条件。不久之后,它可以演变成图灵完整的语言。

Why don't you use an existing template language such as Django template language:

为什么不使用现有的模板语言,如Django模板语言:

Your example:

blah blah text here... #[IsFreeShipping, 
WhenTrue="
$[FreeShipping]"]
blah blah text here also...

Using Django template language:

使用Django模板语言:

blah blah text here... {% if IsFreeShipping %}

{{ FreeShipping }}
{% endif %} blah blah text here also...

#7


This works for your sample:

这适用于您的样本:

#\[(?:[^\]$]+|\$(?!\[)|\$\[[^\[\]]*\])*\]

It assumes that the inner square brackets can't themselves contain square brackets. If the inner tokens can also contain tokens, you're probably out of luck. Some regex flavors can handle recursive structures, but the resulting regexes are hideous even by regex standards. :D

它假定内方括号本身不能包含方括号。如果内部令牌也可以包含令牌,那么你可能运气不好。一些正则表达式的风格可以处理递归结构,但即使通过正则表达式标准,生成的正则表达式仍然是可怕的。 :d

Tis regex also treats the '$' as special only if it's followed by an opening square bracket. If you want to disallow its use otherwise, remove the second alternative: |\$(?!\[)

Tis正则表达式只会将'$'视为特殊的,只要它后面是一个开头的方括号。如果您不想禁用它,请删除第二个选项:| \ $(?!\ [)


推荐阅读
author-avatar
追求生活的垃圾筒
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有