热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

查找字符串是否是迭代子字符串?-Findingifastringisaniterativesubstring?

IhaveastringS.HowcanIfindifthestringfollowsSnT.我有一个字符串S.我怎么能找到字符串是否遵循SnT。Examp

I have a string S. How can I find if the string follows S = nT.

我有一个字符串S.我怎么能找到字符串是否遵循S = nT。

Examples:
Function should return true if
1) S = "abab"
2) S = "abcdabcd"
3) S = "abcabcabc"
4) S = "zzxzzxzzx"

示例:如果1)S =“abab”2)S =“abcdabcd”3)S =“abcabcabc”4)S =“zzxzzxzzx”,则函数应返回true

But if S="abcb" returns false.

但是如果S =“abcb”返回false。

I though maybe we can repeatedly call KMP on substrings of S and then decide.

我虽然也许我们可以在S的子串上反复调用KMP然后决定。

eg: for "abab": call on KMP on "a". it returns 2(two instances). now 2*len("a")!=len(s)
call on KMP on "ab". it returns 2. now 2*len("ab")==len(s) so return true

例如:对于“abab”:在“a”上拨打KMP。它返回2(两个实例)。现在2 * len(“a”)!= len(s)在“ab”上拨打KMP。它返回2.现在2 * len(“ab”)== len(s)所以返回true

Can you suggest any better algorithms?

你能建议更好的算法吗?

8 个解决方案

#1


5  

I can think of a heuristic, only call KMP on a sub string if Len(original string)/Len of(sub string) is a positive integer.

如果Len(原始字符串)/ Len(子字符串)是一个正整数,我可以想到一个启发式,只在子字符串上调用KMP。

Also the maximum length of the sub string has to be less than N/2.

此外,子串的最大长度必须小于N / 2。

EDIT

Using these Heuristics Iwrote the follwing python code because my C is rusty at the moment

使用这些启发式方法我写了下面的python代码,因为我的C现在生锈了

oldstr='ABCDABCD'    

for i in xrange(0,len(oldstr)/2):
       newslice=oldstr[0:i+1]
         if newslice*(len(oldstr)/len(newslice)) == oldstr:
             print 'pattern found', newslice
             break

#2


4  

You actually only need to care about testing substring lengths that are equal to the full string length divided by a prime number. The reason is: If S contains n copies of T, and n is not prime, then n = ab, and so S actually also contains a copies of bT (where "bT" means "T repeated b times"). This is an extension of anijhaw's answer.

实际上,您只需要关心测试子串长度等于完整字符串长度除以素数。原因是:如果S包含n个拷贝的T,并且n不是素数,则n = ab,因此S实际上还包含bT的副本(其中“bT”表示“T重复b次”)。这是anijhaw答案的延伸。

int primes[] = { 2, 3, 5, 7, 11, 13, 17 };  /* There are one or two more... ;) */
int nPrimes = sizeof primes / sizeof primes[0];

/* Passing in the string length instead of assuming ASCIIZ strings means we
 * don't have to modify the string in-place or allocate memory for new copies
 * to handle recursion. */
int is_iterative(char *s, int len) {
    int i, j;
    for (i = 0; i 

Notice that when recursing to find even shorter repeated substrings, we don't need to check the entire string again, just the first larger repeat -- since we've already established that the remaining large repeats are, well, repeats of the first one. :)

请注意,当递归查找更短的重复子串时,我们不需要再次检查整个字符串,只需要检查第一个较大的重复 - 因为我们已经确定剩余的大重复是,重复第一个重复。 :)

#3


1  

I don't see that the KMP algorithm helps in this case. It is not a matter of determining where to begin the next match. It seems that one way to reduce the search time is to start with the longest possibility (half the length) and work downward. The only lengths that neeed to be tested are lengths that evenly divide into the total length. Here is an example in Ruby. I should add that I realize the question was tagged as C, but this was just a simple way to show the algorithm I was thinking about (and allowed me to test that it worked).

在这种情况下,我没有看到KMP算法有帮助。这不是确定从哪里开始下一场比赛的问题。似乎减少搜索时间的一种方法是从最长的可能性(长度的一半)开始并向下工作。需要测试的唯一长度是均匀分成总长度的长度。这是Ruby中的一个例子。我应该补充一点,我意识到问题被标记为C,但这只是一种简单的方式来显示我正在考虑的算法(并允许我测试它是否有效)。

class String
def IsPattern( )
    len = self.length
    testlen = len / 2
    # the fastest is to start with two entries and work down
    while ( testlen > 0 )
        # if this is not an even divisor then it can't fit the pattern
        if ( len % testlen == 0 )
            # evenly divides, so it may match
            if ( self == self[0..testlen-1] * ( len / testlen ))
                return true
            end

        end
        testlen = testlen - 1
    end
    # must not have matched
    false
end
end

if __FILE__ == $0

   ARGV.each do |str|
       puts "%s, %s" % [str, str.IsPattern ? "true" : "false" ]
   end

end



[C:\test]ruby patterntest.rb a aa abab abcdabcd abcabcabc zzxzzxzzx abcd
a, false
aa, true
abab, true
abcdabcd, true
abcabcabc, true
zzxzzxzzx, true
abcd, false

#4


0  

I suppose you could try the following algorithm:

我想你可以尝试以下算法:

Lets L to be a possible substring length which generates the original word. For L from 1 to strlen(s)/2 check if the first character acquires in all L*i positions for i from 1 to strlen(s)/L. If it does then it could be a possible solution and you should check it with memcmp, if not try the next L. Of course you can skip some L values which are not dividing strlen(s).

让L成为可能产生原始单词的子串长度。对于L从1到strlen(s)/ 2,检查第一个字符是否在i的所有L * i位置从1到strlen(s)/ L获得。如果它确实那么它可能是一个可能的解决方案你应该用memcmp检查它,如果没有尝试下一个L.当然你可以跳过一些没有划分strlen的L值。

#5


0  

Try this:

    char s[] = "abcabcabcabc";
int nStringLength = strlen (s);
int nMaxCheckLength = nStringLength / 2;
int nThisOffset;
int nNumberOfSubStrings;
char cMustMatch;
char cCompare;
BOOL bThisSubStringLengthRepeats;
// Check all sub string lengths up to half the total length
for (int nSubStringLength = 1;  nSubStringLength <= nMaxCheckLength;  nSubStringLength++)
{
    // How many substrings will there be?
    nNumberOfSubStrings = nStringLength / nSubStringLength;

    // Only check substrings that fit exactly
    if (nSubStringLength * nNumberOfSubStrings == nStringLength)
    {
        // Assume it's going to be ok
        bThisSubStringLengthRepeats = TRUE;

        // check each character in substring
        for (nThisOffset = 0;  nThisOffset 

#6


0  

This is Java code but you should get the idea:

这是Java代码,但你应该明白这个想法:

        String str = "ababcababc";
    int repPos = 0;
    int repLen = 0;
    for( int i = 0; i 

This will return the length of the shortest repeating chunk or the length of the string if there's no repetition.

如果没有重复,这将返回最短重复块的长度或字符串的长度。

#7


0  

You can build the suffix array of the string, sort it.
Now look for series of ever doubling suffixes and when you've reached one that's the size of the entire string (S) the first in the series will give you T.

您可以构建字符串的后缀数组,对其进行排序。现在寻找一系列加倍的后缀,当你达到整个字符串(S)的大小时,系列中的第一个会给你T.

For example:

abcd <-- T
abcdabcd <-- S
bcd
bcdabcd
cd
cdabcd
d
dabcd

x
xzzx
xzzxzzx
zx
zxzzx
zxzzxzzx
zzx <-- T
zzxzzx
zzxzzxzzx <-- S

a
apa
apapa
apapapa
pa <-- T
papa
papapa <-- Another T, not detected by this algo
papapapa <-- S

#8


0  

The brute force approach would be to pick all possible substrings and see if they can form the entire string.

蛮力方法是挑选所有可能的子串,看看它们是否可以形成整个弦。

We can do one better using the observation that for a substring to be a valid candidate len(str) % len(substr) == 0. This can be deduced from the problem statement.

我们可以使用观察结果更好地做一个子字符串作为有效候选len(str)%len(substr)== 0.这可以从问题陈述中推断出来。

Here is the full code:

这是完整的代码:

bool isRational(const string &str){
    int len = str.length();
    const auto &factors = getFactors(len); // this would include 1 but exclude len
    // sort(factors.begin(), factors.end()); To get out of the loop faster. Why? See https://stackoverflow.com/a/4698155/1043773
    for(auto iter = factors.rbegin(); iter != factors.rend(); ++iter){
        auto factor = *iter;
        bool result = true;
        for(int i = 0; i 

Notice that there is a faster variation in terms of time complexity which uses KMP.

请注意,使用KMP的时间复杂度存在更快的变化。

The above algorithm is O(N * factorCount(N)) But the good thing about this algorithm is it can bail out much faster than the KMP algorithm. Also the number of factors do not grow much.

上面的算法是O(N * factorCount(N))但是这个算法的好处是它可以比KMP算法更快地拯救。此外,因素的数量也不会增长太多。

Here is the graph of [i, factorCount(i)] for i <= 10^6

这是i <= 10 ^ 6的[i,factorCount(i)]的图表

enter image description here

Here is how the algorithm performs as against the KMP algorithm. Red graph is O(N * factorCount(N)) and Blue is O(N) KMP

以下是算法如何执行KMP算法。红色图是O(N * factorCount(N)),蓝色是O(N)KMP

The KMP code is picked from here

从这里挑选KMP代码

enter image description here


推荐阅读
author-avatar
刘德华
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有