热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

在范围[0..n-1]中产生m个不同的随机数-Generatingmdistinctrandomnumbersintherange[0..n-1]

Ihavetwomethodsofgeneratingmdistinctrandomnumbersintherange[0..n-1]我有两种方法在范围[0.n-1]中生

I have two methods of generating m distinct random numbers in the range [0..n-1]

我有两种方法在范围[0. n-1]中生成m个不同的随机数。

Method 1:

方法1:

//C++-ish pseudocode
int result[m];
for(i = 0; i 

Method 2:

方法2:

//C++-ish pseudocode
int arr[n];
for(int i = 0; i 

The first method is more efficient when n is much larger than m, whereas the second is more efficient otherwise. But "much larger" isn't that strict a notion, is it? :)

当n大于m时,第一种方法更有效,而第二种方法则更有效。但是“大得多”并不是一个严格的概念,不是吗?:)

Question: What formula of n and m should I use to determine whether method1 or method2 will be more efficient? (in terms of mathematical expectation of the running time)

问:我应该用n和m的什么公式来确定method1或method2是否会更有效?(关于运行时间的数学期望)

9 个解决方案

#1


15  

Pure mathematics:
Let's calculate the quantity of rand() function calls in both cases and compare the results:

纯数学:让我们计算两种情况下rand()函数调用的数量,并比较结果:

Case 1: let's see the mathematical expectation of calls on step i = k, when you already have k numbers chosen. The probability to get a number with one rand() call is equal to p = (n-k)/n. We need to know the mathematical expectation of such calls quantity which leads to obtaining a number we don't have yet.

情形1:让我们看一下,当你已经选择了k个数字的时候,关于step i = k的调用的数学期望。用一个rand()调用得到一个数字的概率等于p = (n-k)/n。我们需要知道这种调用量的数学期望,从而得到一个我们还没有得到的数字。

The probability to get it using 1 call is p. Using 2 calls - q * p, where q = 1 - p. In general case, the probability to get it exactly after n calls is (q^(n-1))*p. Thus, the mathematical expectation is
Sum[ n * q^(n-1) * p ], n = 1 --> INF. This sum is equal to 1/p (proved by wolfram alpha).

用1次调用得到它的概率是p,使用2个调用- q *p,其中q = 1 - p,一般情况下,在n次调用后得到它的概率是(q (n-1))*p。因此,数学期望是和[n * q (n-1) * p], n = 1——> INF,这个和等于1/p(由wolfram alpha证明)。

So, on the step i = k you will perform 1/p = n/(n-k) calls of the rand() function.

因此,在步骤i = k时,您将执行rand()函数的1/p = n/(n-k)调用。

Now let's sum it overall:

现在我们总结一下:

Sum[ n/(n - k) ], k = 0 --> m - 1 = n * T - the number of rand calls in method 1.
Here T = Sum[ 1/(n - k) ], k = 0 --> m - 1

Sum[n/(n - k)], k = 0 -> m - 1 = n * T -方法1中的rand调用数。这里T =和[1/(n - k)], k = 0 -> m - 1。

Case 2:

案例2:

Here rand() is called inside random_shuffle n - 1 times (in most implementations).

在这里,rand()被调用在random_shuffle n - 1次(在大多数实现中)。

Now, to choose the method, we have to compare these two values: n * T ? n - 1.
So, to choose the appropriate method, calculate T as described above. If T <(n - 1)/n it's better to use the first method. Use the second method otherwise.

现在,要选择这个方法,我们需要比较这两个值:n * T ?n - 1。因此,选择合适的方法,计算T如上所述。如果T <(n - 1)/n,最好使用第一种方法。否则,使用第二种方法。

#2


9  

Check the Wikipedia description of the original Fisher-Yates algorithm. It advocates using essentially your method 1 for up to n/2, and your method 2 for the remainder.

检查维基百科对原始Fisher-Yates算法的描述。它提倡使用基本的方法1到n/2,剩下的方法是2。

#3


6  

Here's an algorithm that will work in O(n) memory and O(n) time (where n is the number of returned results, not the size of the set you're selecting from) for any result set. It's in Python for convenience because it uses a hashtable:

这是一个在O(n)内存和O(n)时间内工作的算法(其中n是返回结果的数目,而不是你选择的集合的大小)。它在Python中是为了方便,因为它使用了一个hashtable:

def random_elements(num_elements, set_size):
    state = {}
    for i in range(num_elements):
        # Swap state[i] with a random element
        swap_with = random.randint(i, set_size - 1)
        state[i], state[swap_with] = state.get(swap_with, swap_with), state.get(i, i)
    return [state[i] for i in range(num_elements) # effectively state[:num_elements] if it were a list/array.

This is just a partial fisher-yates shuffle, with the array being shuffled implemented as a sparse hashtable - any element that is not present is equal to its index. We shuffle the first num_elements indices, and return those values. In the case that set_size = 1, this is equivalent to picking a random number in the range, and in the case that num_elements = set_size, this is equivalent to a standard fisher-yates shuffle.

这只是一个部分的fisher-yates洗牌,数组被拖拽实现为一个稀疏hashtable——任何不存在的元素都等于它的索引。我们对第一个num_elements索引进行洗牌,并返回这些值。在set_size = 1的情况下,这相当于在range中选择一个随机数,在num_elements = set_size的情况下,这相当于一个标准的fisher-yates洗牌。

It's trivial to observe that this is O(n) time, and because each iteration of the loop initializes at most two new indices in the hashtable, it's O(n) space, too.

观察到这是O(n)时间是很简单的,因为循环的每一次迭代都在哈希表中的两个新索引中初始化,也就是O(n)空间。

#4


5  

Personally, I would use Method 1, and then if M > N/2, choose N-M values, and then invert the array (return the numbers that were not picked). So for example, if N is 1000 and you want 950 of them, chose 50 values using Method 1, and then return the other 950.

我个人将使用方法1,然后如果M > N/2,选择N-M值,然后反转数组(返回未选中的数字)。例如,如果N是1000,而你想要950个,用方法1选择50个值,然后返回另一个950。

Edit: Though, if consistent performance is your goal, I would use a modified method 2, which doesn't do the full shuffle, but only shuffles the first M elements of your N length array.

编辑:虽然,如果你的目标是一致的性能,我将使用一个修改过的方法2,它不会进行完全的洗牌,但只会打乱N长度数组的第一个M元素。

int arr[n];
for(int i = 0; i 

#5


3  

What about a third method?

第三种方法呢?

int result[m];
for(i = 0; i 

Edit it should be <=. and it would actually additional logic to avoid collisions.

编辑它应该是<=。它实际上是额外的逻辑来避免碰撞。

This is better, an example using the Modern Method from Fisher-Yates

这是一个更好的例子,一个使用现代方法的Fisher-Yates。

//C++-ish pseudocode
int arr[n];
for(int i = 0; i 

#6


2  

Talking about mathematical expectation, it's pretty useless but I will post it anyway :D

说到数学期望,它是无用的,但我还是会把它贴出来:D。

Shuffle is simple O(m).

洗牌是简单的O(m)。

Now the other algorithm is a bit more complex. The number of steps needed to generate the next number is the expected value of the number of trials, and the probability of the trial length is a geomtric distribution. So...

另一种算法有点复杂。生成下一个数字所需的步骤数是试验次数的期望值,而试验长度的概率是一个地轴分布。所以…

p=1          E[X1]=1            = 1           = 1
p=1-1/n      E[x2]=1/(1-1/n)    = 1 + 1/(n-1) = 1 + 1/(n-1) 
p=1-2/n      E[x3]=1/(1-1/n)    = 1 + 2/(n-2) = 1 + 1/(n-2) + 1/(n-2)
p=1-3/n      E[X4]=1/(1-2/n)    = 1 + 3/(n-3) = 1 + 1/(n-3) + 1/(n-3) + 1(n-3)
....
p=1-(m-1)/n) E[Xm]=1/(1-(m-1)/n))

Note that the sum can be split up into a triangle shape, see right hand side.

注意,这个和可以被分割成一个三角形的形状,看右边。

Let's use the formula for the harmonic series: H_n = Sum k=0->n (1/k) = approx ln(k)

让我们用调和级数的公式:H_n = Sum k=0->n (1/k) =约ln(k)

Sum(E[Xk]) = m + ln(n-1)-ln(n-m-1) + ln(n-2)-ln(n-m-1) + ... = m + ln(n-1) + ln(n-2) + ... - (m-1)*ln(n-m-1) ..

And there is some forumla for the sum of harmonic series, if you are still interested I will look it up...

对于和声级数的和有一些公式,如果你们还感兴趣的话,我会查一下。

Update: actually it's quite nice formula (thanks to the brilliant Concrete Mathematics book)

更新:其实这是一个很好的公式(多亏了那本精彩的混凝土数学书)

Sum(H_k) k=0->n = n*H_n - n

So the expected number of steps:

所以期望的步骤数:

Sum(E[Xk]) = m + (n-1)*ln(n-1) - (n-1) - (n-m-1)*ln(n-m-1) - (n-m-1)) - (m-1)*ln(n-m-1).

Note: I haven't verified it.

注:我还没有核实。

#7


1  

This is a bit of a long shot, but it could work, depending on your system.

这是一个很长的镜头,但它可以工作,取决于你的系统。

  1. Start with some reasonable ratio, like 0.5.
  2. 从一些合理的比率开始,比如0.5。
  3. When a request comes in, process it with whichever method you get from the current value of the threshold ratio.
  4. 当一个请求进来时,用您从阈值比的当前值获得的任何方法处理它。
  5. Record the time it takes and when you have "empty" time, perform the same task with the other method.
  6. 记录下它所花费的时间,当你有“空”时间时,用另一种方法执行相同的任务。
  7. If the alternative solution is much faster than the original one, adjust the threshold up or down.
  8. 如果替代解决方案比原来的解决方案快得多,调整阈值向上或向下。

The obvious flaw in this method is that on highly variable load systems your "offline" test won't be too reliable.

这种方法的明显缺点是,在高度可变的负载系统上,您的“脱机”测试不会太可靠。

#8


0  

There was suggested Fisher-Yates shuffle. Don't know if the next code generates equally distributed integers, but it is at least compact and one-pass:

有人建议说要进行“费雪”的洗牌。不知道下一个代码是否生成同样的分布式整数,但它至少是紧凑的和一次性的:

std::random_device rd;
std::mt19937 g(rd());
for (size_type i = 1; i 

#9


-1  

Quite possibly it would be more simple to start it in debug mode (and keep one method as a note) for a couple of times to get an average, then use the other method to get an average from that

很有可能在调试模式下启动它(并将一个方法作为一个注释),以获得平均值,然后使用另一种方法来获得平均值。


推荐阅读
  • 由二叉树到贪心算法
    二叉树很重要树是数据结构中的重中之重,尤其以各类二叉树为学习的难点。单就面试而言,在 ... [详细]
  • 本文探讨了如何通过预处理器开关选择不同的类实现,并解决在特定情况下遇到的链接器错误。 ... [详细]
  • 本文介绍如何使用MFC和ADO技术调用SQL Server中的存储过程,以查询指定小区在特定时间段内的通话统计数据。通过用户界面选择小区ID、开始时间和结束时间,系统将计算并展示小时级的通话量、拥塞率及半速率通话比例。 ... [详细]
  • 1.执行sqlsever存储过程,消息:SQLServer阻止了对组件“AdHocDistributedQueries”的STATEMENT“OpenRowsetOpenDatas ... [详细]
  • 探讨 HDU 1536 题目,即 S-Nim 游戏的博弈策略。通过 SG 函数分析游戏胜负的关键,并介绍如何编程实现解决方案。 ... [详细]
  • 本文介绍了如何在 C# 和 XNA 框架中实现一个自定义的 3x3 矩阵类(MMatrix33),旨在深入理解矩阵运算及其应用场景。该类参考了 AS3 Starling 和其他相关资源,以确保算法的准确性和高效性。 ... [详细]
  • 深入解析ESFramework中的AgileTcp组件
    本文详细介绍了ESFramework框架中AgileTcp组件的设计与实现。AgileTcp是ESFramework提供的ITcp接口的高效实现,旨在优化TCP通信的性能和结构清晰度。 ... [详细]
  • 2018-2019学年第六周《Java数据结构与算法》学习总结
    本文总结了2018-2019学年第六周在《Java数据结构与算法》课程中的学习内容,重点介绍了非线性数据结构——树的相关知识及其应用。 ... [详细]
  • 本文详细介绍了优化DB2数据库性能的多种方法,涵盖统计信息更新、缓冲池调整、日志缓冲区配置、应用程序堆大小设置、排序堆参数调整、代理程序管理、锁机制优化、活动应用程序限制、页清除程序配置、I/O服务器数量设定以及编入组提交数调整等方面。通过这些技术手段,可以显著提升数据库的运行效率和响应速度。 ... [详细]
  • Coursera ML 机器学习
    2019独角兽企业重金招聘Python工程师标准线性回归算法计算过程CostFunction梯度下降算法多变量回归![选择特征](https:static.oschina.n ... [详细]
  • Java 实现二维极点算法
    本文介绍了一种使用 Java 编程语言实现的二维极点算法。该算法用于从一组二维坐标中筛选出极点,适用于需要处理几何图形和空间数据的应用场景。文章不仅详细解释了算法的工作原理,还提供了完整的代码示例。 ... [详细]
  • Oracle中NULL、空字符串和空格的处理与区别
    本文探讨了在Oracle数据库中使用NULL、空字符串('')和空格('_')时可能遇到的问题及解决方案。重点解释了它们之间的区别,以及在查询和函数中的行为。 ... [详细]
  • 本文介绍如何利用栈数据结构在C++中判断字符串中的括号是否匹配。通过顺序栈和链栈两种方式实现,并详细解释了算法的核心思想和具体实现步骤。 ... [详细]
  • 如何使用Ping命令来测试网络连接?当网卡安装和有关参数配置完成后,可以使用ping命令来测试一下网络是否连接成功。以winXP为例1、打开XP下DOS窗口具体操作是点击“开始”菜 ... [详细]
  • 本文详细介绍了Hive中用于日期和字符串相互转换的多种函数,包括从时间戳到日期格式的转换、日期到时间戳的转换,以及如何处理不同格式的日期字符串。通过这些函数,用户可以轻松实现日期和字符串之间的灵活转换,满足数据处理中的各种需求。 ... [详细]
author-avatar
连向明
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有