Since you care most about run length, you could generate random run lengths instead of random bits, so as to give them the exact distribution you want.
由于您最关心的是运行长度,因此您可以生成随机运行长度而不是随机位,以便为它们提供所需的精确分布。
The mean run length in random binary data is of course 4 (sum of n/(2^(n-1))), and the mode average 1. Here are some random bits (I swear this is a single run, I didn't pick a value to make my point):
随机二进制数据的平均运行长度当然是4(n /(2 ^(n-1))之和)和模式平均值1.这里有一些随机位(我发誓这是一次运行,我没有选择一个值来表明我的观点):
0111111011111110110001000101111001100000000111001010101101001000
See there's a run length of 8 in there. This is not especially surprising, since run length 8 should occur roughly every 256 bits and I've generated 64 bits.
看那里的跑步长度为8。这并不特别令人惊讶,因为运行长度8应该大约每256位发生一次,并且我生成了64位。
If this doesn't "look random" to you because of excessive run lengths, then generate run lengths with whatever distribution you want. In pseudocode:
如果由于运行长度过长而不“随机”,则生成所需分布的运行长度。在伪代码中:
loop
get a random number
output that many 1 bits
get a random number
output that many 0 bits
endloop
You'd probably want to discard some initial data from the stream, or randomise the first bit, to avoid the problem that as it stands, the first bit is always 1. The probability of the Nth bit being 1 depends on how you "get a random number", but for anything that achieves "shortish but not too short" run lengths it will soon be as close to 50% as makes no difference.
您可能想要从流中丢弃一些初始数据,或者将第一位随机化,以避免出现问题,第一位始终为1.第N位为1的概率取决于您“获得”的方式一个随机数字“,但对于任何达到”短暂但不太短“的运行长度,它将很快接近50%,因为没有区别。
For instance "get a random number" might do this:
例如,“获取随机数”可能会这样做:
get a uniformly-distributed random number n from 1 to 81
if n is between 1 and 54, return 1
if n is between 55 and 72, return 2
if n is between 72 and 78, return 3
if n is between 79 and 80, return 4
return 5
The idea is that the probability of a run of length N is one third the probability of a run of length N-1, instead of one half. This will give much shorter average run lengths, and a longest run of 5, and would therefore "look more random" to you. Of course it would not "look random" to anyone used to dealing with sequences of coin tosses, because they'd think the runs were too short. You'd also be able to tell very easily with statistical tests that the value of digit N is correlated with the value of digit N-1.
这个想法是长度N的运行概率是长度N-1运行概率的三分之一,而不是一半。这将使平均运行时间缩短得多,并且最长运行时间为5,因此对您来说“看起来更随机”。当然,任何习惯于处理硬币投掷序列的人都不会“看起来随意”,因为他们认为跑步太短了。您还可以通过统计测试很容易地判断出数字N的值与数字N-1的值相关。
This code uses at least log(81) = 6.34 "random bits" to generate on average 1.44 bits of output, so is slower than just generating uniformly-distributed bits. But it shouldn't be much more than about 7/1.44 = 5 times slower, and a LFSR is pretty fast to start with.
此代码至少使用log(81)= 6.34“随机位”来生成平均1.44位的输出,因此比生成均匀分布的位慢。但它应该不会超过7 / 1.44 = 5倍慢,而LFSR开始时速度相当快。