热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

在范围[0..n-1]中产生m个不同的随机数-Generatingmdistinctrandomnumbersintherange[0..n-1]

Ihavetwomethodsofgeneratingmdistinctrandomnumbersintherange[0..n-1]我有两种方法在范围[0.n-1]中生

I have two methods of generating m distinct random numbers in the range [0..n-1]

我有两种方法在范围[0. n-1]中生成m个不同的随机数。

Method 1:

方法1:

//C++-ish pseudocode
int result[m];
for(i = 0; i 

Method 2:

方法2:

//C++-ish pseudocode
int arr[n];
for(int i = 0; i 

The first method is more efficient when n is much larger than m, whereas the second is more efficient otherwise. But "much larger" isn't that strict a notion, is it? :)

当n大于m时,第一种方法更有效,而第二种方法则更有效。但是“大得多”并不是一个严格的概念,不是吗?:)

Question: What formula of n and m should I use to determine whether method1 or method2 will be more efficient? (in terms of mathematical expectation of the running time)

问:我应该用n和m的什么公式来确定method1或method2是否会更有效?(关于运行时间的数学期望)

9 个解决方案

#1


15  

Pure mathematics:
Let's calculate the quantity of rand() function calls in both cases and compare the results:

纯数学:让我们计算两种情况下rand()函数调用的数量,并比较结果:

Case 1: let's see the mathematical expectation of calls on step i = k, when you already have k numbers chosen. The probability to get a number with one rand() call is equal to p = (n-k)/n. We need to know the mathematical expectation of such calls quantity which leads to obtaining a number we don't have yet.

情形1:让我们看一下,当你已经选择了k个数字的时候,关于step i = k的调用的数学期望。用一个rand()调用得到一个数字的概率等于p = (n-k)/n。我们需要知道这种调用量的数学期望,从而得到一个我们还没有得到的数字。

The probability to get it using 1 call is p. Using 2 calls - q * p, where q = 1 - p. In general case, the probability to get it exactly after n calls is (q^(n-1))*p. Thus, the mathematical expectation is
Sum[ n * q^(n-1) * p ], n = 1 --> INF. This sum is equal to 1/p (proved by wolfram alpha).

用1次调用得到它的概率是p,使用2个调用- q *p,其中q = 1 - p,一般情况下,在n次调用后得到它的概率是(q (n-1))*p。因此,数学期望是和[n * q (n-1) * p], n = 1——> INF,这个和等于1/p(由wolfram alpha证明)。

So, on the step i = k you will perform 1/p = n/(n-k) calls of the rand() function.

因此,在步骤i = k时,您将执行rand()函数的1/p = n/(n-k)调用。

Now let's sum it overall:

现在我们总结一下:

Sum[ n/(n - k) ], k = 0 --> m - 1 = n * T - the number of rand calls in method 1.
Here T = Sum[ 1/(n - k) ], k = 0 --> m - 1

Sum[n/(n - k)], k = 0 -> m - 1 = n * T -方法1中的rand调用数。这里T =和[1/(n - k)], k = 0 -> m - 1。

Case 2:

案例2:

Here rand() is called inside random_shuffle n - 1 times (in most implementations).

在这里,rand()被调用在random_shuffle n - 1次(在大多数实现中)。

Now, to choose the method, we have to compare these two values: n * T ? n - 1.
So, to choose the appropriate method, calculate T as described above. If T <(n - 1)/n it's better to use the first method. Use the second method otherwise.

现在,要选择这个方法,我们需要比较这两个值:n * T ?n - 1。因此,选择合适的方法,计算T如上所述。如果T <(n - 1)/n,最好使用第一种方法。否则,使用第二种方法。

#2


9  

Check the Wikipedia description of the original Fisher-Yates algorithm. It advocates using essentially your method 1 for up to n/2, and your method 2 for the remainder.

检查维基百科对原始Fisher-Yates算法的描述。它提倡使用基本的方法1到n/2,剩下的方法是2。

#3


6  

Here's an algorithm that will work in O(n) memory and O(n) time (where n is the number of returned results, not the size of the set you're selecting from) for any result set. It's in Python for convenience because it uses a hashtable:

这是一个在O(n)内存和O(n)时间内工作的算法(其中n是返回结果的数目,而不是你选择的集合的大小)。它在Python中是为了方便,因为它使用了一个hashtable:

def random_elements(num_elements, set_size):
    state = {}
    for i in range(num_elements):
        # Swap state[i] with a random element
        swap_with = random.randint(i, set_size - 1)
        state[i], state[swap_with] = state.get(swap_with, swap_with), state.get(i, i)
    return [state[i] for i in range(num_elements) # effectively state[:num_elements] if it were a list/array.

This is just a partial fisher-yates shuffle, with the array being shuffled implemented as a sparse hashtable - any element that is not present is equal to its index. We shuffle the first num_elements indices, and return those values. In the case that set_size = 1, this is equivalent to picking a random number in the range, and in the case that num_elements = set_size, this is equivalent to a standard fisher-yates shuffle.

这只是一个部分的fisher-yates洗牌,数组被拖拽实现为一个稀疏hashtable——任何不存在的元素都等于它的索引。我们对第一个num_elements索引进行洗牌,并返回这些值。在set_size = 1的情况下,这相当于在range中选择一个随机数,在num_elements = set_size的情况下,这相当于一个标准的fisher-yates洗牌。

It's trivial to observe that this is O(n) time, and because each iteration of the loop initializes at most two new indices in the hashtable, it's O(n) space, too.

观察到这是O(n)时间是很简单的,因为循环的每一次迭代都在哈希表中的两个新索引中初始化,也就是O(n)空间。

#4


5  

Personally, I would use Method 1, and then if M > N/2, choose N-M values, and then invert the array (return the numbers that were not picked). So for example, if N is 1000 and you want 950 of them, chose 50 values using Method 1, and then return the other 950.

我个人将使用方法1,然后如果M > N/2,选择N-M值,然后反转数组(返回未选中的数字)。例如,如果N是1000,而你想要950个,用方法1选择50个值,然后返回另一个950。

Edit: Though, if consistent performance is your goal, I would use a modified method 2, which doesn't do the full shuffle, but only shuffles the first M elements of your N length array.

编辑:虽然,如果你的目标是一致的性能,我将使用一个修改过的方法2,它不会进行完全的洗牌,但只会打乱N长度数组的第一个M元素。

int arr[n];
for(int i = 0; i 

#5


3  

What about a third method?

第三种方法呢?

int result[m];
for(i = 0; i 

Edit it should be <=. and it would actually additional logic to avoid collisions.

编辑它应该是<=。它实际上是额外的逻辑来避免碰撞。

This is better, an example using the Modern Method from Fisher-Yates

这是一个更好的例子,一个使用现代方法的Fisher-Yates。

//C++-ish pseudocode
int arr[n];
for(int i = 0; i 

#6


2  

Talking about mathematical expectation, it's pretty useless but I will post it anyway :D

说到数学期望,它是无用的,但我还是会把它贴出来:D。

Shuffle is simple O(m).

洗牌是简单的O(m)。

Now the other algorithm is a bit more complex. The number of steps needed to generate the next number is the expected value of the number of trials, and the probability of the trial length is a geomtric distribution. So...

另一种算法有点复杂。生成下一个数字所需的步骤数是试验次数的期望值,而试验长度的概率是一个地轴分布。所以…

p=1          E[X1]=1            = 1           = 1
p=1-1/n      E[x2]=1/(1-1/n)    = 1 + 1/(n-1) = 1 + 1/(n-1) 
p=1-2/n      E[x3]=1/(1-1/n)    = 1 + 2/(n-2) = 1 + 1/(n-2) + 1/(n-2)
p=1-3/n      E[X4]=1/(1-2/n)    = 1 + 3/(n-3) = 1 + 1/(n-3) + 1/(n-3) + 1(n-3)
....
p=1-(m-1)/n) E[Xm]=1/(1-(m-1)/n))

Note that the sum can be split up into a triangle shape, see right hand side.

注意,这个和可以被分割成一个三角形的形状,看右边。

Let's use the formula for the harmonic series: H_n = Sum k=0->n (1/k) = approx ln(k)

让我们用调和级数的公式:H_n = Sum k=0->n (1/k) =约ln(k)

Sum(E[Xk]) = m + ln(n-1)-ln(n-m-1) + ln(n-2)-ln(n-m-1) + ... = m + ln(n-1) + ln(n-2) + ... - (m-1)*ln(n-m-1) ..

And there is some forumla for the sum of harmonic series, if you are still interested I will look it up...

对于和声级数的和有一些公式,如果你们还感兴趣的话,我会查一下。

Update: actually it's quite nice formula (thanks to the brilliant Concrete Mathematics book)

更新:其实这是一个很好的公式(多亏了那本精彩的混凝土数学书)

Sum(H_k) k=0->n = n*H_n - n

So the expected number of steps:

所以期望的步骤数:

Sum(E[Xk]) = m + (n-1)*ln(n-1) - (n-1) - (n-m-1)*ln(n-m-1) - (n-m-1)) - (m-1)*ln(n-m-1).

Note: I haven't verified it.

注:我还没有核实。

#7


1  

This is a bit of a long shot, but it could work, depending on your system.

这是一个很长的镜头,但它可以工作,取决于你的系统。

  1. Start with some reasonable ratio, like 0.5.
  2. 从一些合理的比率开始,比如0.5。
  3. When a request comes in, process it with whichever method you get from the current value of the threshold ratio.
  4. 当一个请求进来时,用您从阈值比的当前值获得的任何方法处理它。
  5. Record the time it takes and when you have "empty" time, perform the same task with the other method.
  6. 记录下它所花费的时间,当你有“空”时间时,用另一种方法执行相同的任务。
  7. If the alternative solution is much faster than the original one, adjust the threshold up or down.
  8. 如果替代解决方案比原来的解决方案快得多,调整阈值向上或向下。

The obvious flaw in this method is that on highly variable load systems your "offline" test won't be too reliable.

这种方法的明显缺点是,在高度可变的负载系统上,您的“脱机”测试不会太可靠。

#8


0  

There was suggested Fisher-Yates shuffle. Don't know if the next code generates equally distributed integers, but it is at least compact and one-pass:

有人建议说要进行“费雪”的洗牌。不知道下一个代码是否生成同样的分布式整数,但它至少是紧凑的和一次性的:

std::random_device rd;
std::mt19937 g(rd());
for (size_type i = 1; i 

#9


-1  

Quite possibly it would be more simple to start it in debug mode (and keep one method as a note) for a couple of times to get an average, then use the other method to get an average from that

很有可能在调试模式下启动它(并将一个方法作为一个注释),以获得平均值,然后使用另一种方法来获得平均值。


推荐阅读
  • 本文详细介绍如何使用Python进行配置文件的读写操作,涵盖常见的配置文件格式(如INI、JSON、TOML和YAML),并提供具体的代码示例。 ... [详细]
  • C++实现经典排序算法
    本文详细介绍了七种经典的排序算法及其性能分析。每种算法的平均、最坏和最好情况的时间复杂度、辅助空间需求以及稳定性都被列出,帮助读者全面了解这些排序方法的特点。 ... [详细]
  • 1.如何在运行状态查看源代码?查看函数的源代码,我们通常会使用IDE来完成。比如在PyCharm中,你可以Ctrl+鼠标点击进入函数的源代码。那如果没有IDE呢?当我们想使用一个函 ... [详细]
  • 深入理解C++中的KMP算法:高效字符串匹配的利器
    本文详细介绍C++中实现KMP算法的方法,探讨其在字符串匹配问题上的优势。通过对比暴力匹配(BF)算法,展示KMP算法如何利用前缀表优化匹配过程,显著提升效率。 ... [详细]
  • DNN Community 和 Professional 版本的主要差异
    本文详细解析了 DotNetNuke (DNN) 的两种主要版本:Community 和 Professional。通过对比两者的功能和附加组件,帮助用户选择最适合其需求的版本。 ... [详细]
  • 本文详细介绍 Go+ 编程语言中的上下文处理机制,涵盖其基本概念、关键方法及应用场景。Go+ 是一门结合了 Go 的高效工程开发特性和 Python 数据科学功能的编程语言。 ... [详细]
  • 技术分享:从动态网站提取站点密钥的解决方案
    本文探讨了如何从动态网站中提取站点密钥,特别是针对验证码(reCAPTCHA)的处理方法。通过结合Selenium和requests库,提供了详细的代码示例和优化建议。 ... [详细]
  • 1:有如下一段程序:packagea.b.c;publicclassTest{privatestaticinti0;publicintgetNext(){return ... [详细]
  • 本文基于刘洪波老师的《英文词根词缀精讲》,深入探讨了多个重要词根词缀的起源及其相关词汇,帮助读者更好地理解和记忆英语单词。 ... [详细]
  • IneedtofocusTextCellsonebyoneviaabuttonclick.ItriedlistView.ScrollTo.我需要通过点击按钮逐个关注Tex ... [详细]
  • 本文详细介绍了Akka中的BackoffSupervisor机制,探讨其在处理持久化失败和Actor重启时的应用。通过具体示例,展示了如何配置和使用BackoffSupervisor以实现更细粒度的异常处理。 ... [详细]
  • UNP 第9章:主机名与地址转换
    本章探讨了用于在主机名和数值地址之间进行转换的函数,如gethostbyname和gethostbyaddr。此外,还介绍了getservbyname和getservbyport函数,用于在服务器名和端口号之间进行转换。 ... [详细]
  • ImmutableX Poised to Pioneer Web3 Gaming Revolution
    ImmutableX is set to spearhead the evolution of Web3 gaming, with its innovative technologies and strategic partnerships driving significant advancements in the industry. ... [详细]
  • 本文介绍了如何在C#中启动一个应用程序,并通过枚举窗口来获取其主窗口句柄。当使用Process类启动程序时,我们通常只能获得进程的句柄,而主窗口句柄可能为0。因此,我们需要使用API函数和回调机制来准确获取主窗口句柄。 ... [详细]
  • 本文详细解析了Python中的os和sys模块,介绍了它们的功能、常用方法及其在实际编程中的应用。 ... [详细]
author-avatar
连向明
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有