热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

为什么在hashCode中使用素数?-WhyuseaprimenumberinhashCode?

IwasjustwonderingwhyisthatprimesareusedinaclassshashCode()method?Forexample,whenu

I was just wondering why is that primes are used in a class's hashCode() method? For example, when using Eclipse to generate my hashCode() method there is always the prime number 31 used:

我想知道为什么在类的hashCode()方法中使用了质数?例如,当使用Eclipse生成hashCode()方法时,总是使用质数31:

public int hashCode() {
     final int prime = 31;
     //...
}

References:

引用:

Here is a good primer on Hashcode and article on how hashing works that I found (C# but the concepts are transferrable): Eric Lippert's Guidelines and rules for GetHashCode()

这里有一篇关于哈希代码的入门文章和关于哈希如何工作的文章(c#但是概念是可转移的):Eric Lippert的GetHashCode()的指导方针和规则

8 个解决方案

#1


83  

Because you want the number you are multiplying by and the number of buckets you are inserting into to have orthogonal prime factorizations.

因为你想要你乘以的数和你要插入的桶数来得到正交的质因数分解。

Suppose there are 8 buckets to insert into. If the number you are using to multiply by is some multiple of 8, then the bucket inserted into will only be determined by the least significant entry (the one not multiplied at all). Similar entries will collide. Not good for a hash function.

假设有8个桶要插入。如果要相乘的数字是8的某个倍数,那么插入的桶将只由最不重要的条目(根本没有相乘的条目)决定。类似的条目将碰撞。不适合哈希函数。

31 is a large enough prime that the number of buckets is unlikely to be divisible by it (and in fact, modern java HashMap implementations keep the number of buckets to a power of 2).

31是一个足够大的素数,所以桶的数量不太可能被它整除(实际上,现代java HashMap实现将桶的数量保持为2的幂)。

#2


113  

Prime numbers are chosen to best distribute data among hash buckets. If the distribution of inputs is random and evenly spread, then the choice of the hash code/modulus does not matter. It only has an impact when there is a certain pattern to the inputs.

选择素数以便在哈希桶之间最好地分配数据。如果输入的分布是随机的、均匀分布的,那么选择散列码/模数并不重要。只有当输入有特定的模式时才会产生影响。

This is often the case when dealing with memory locations. For example, all 32-bit integers are aligned to addresses divisible by 4. Check out the table below to visualize the effects of using a prime vs. non-prime modulus:

在处理内存位置时通常是这样。例如,所有32位整数都对齐到可被4整除的地址。查看下面的表格,以可视化使用prime和非prime模数的效果:

Input       Modulo 8    Modulo 7
0           0           0
4           4           4
8           0           1
12          4           5
16          0           2
20          4           6
24          0           3
28          4           0

Notice the almost-perfect distribution when using a prime modulus vs. a non-prime modulus.

注意,当使用素数模和非素数模时,几乎完美的分布。

However, although the above example is largely contrived, the general principle is that when dealing with a pattern of inputs, using a prime number modulus will yield the best distribution.

然而,尽管上面的例子很大程度上是人为设计的,但一般的原则是,在处理输入模式时,使用素数模量将产生最佳的分布。

#3


21  

For what it's worth, Effective Java 2nd Edition hand-waives around the mathematics issue and just say that the reason to choose 31 is:

值得注意的是,有效的Java第二版围绕着数学问题手把手地说,选择31的原因是:

  • Because it's an odd prime, and it's "traditional" to use primes
  • 因为它是一个奇数素数,使用素数是“传统的”
  • It's also one less than a power of two, which permits for bitwise optimization
  • 它也比2的幂小1,这允许位优化

Here's the full quote, from Item 9: Always override hashCode when you override equals:

下面是第9项的完整引用:当你重写=时,总是重写hashCode:

The value 31 was chosen because it's an odd prime. If it were even and multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional.

之所以选择31是因为它是一个奇数素数。如果它是偶数并且乘法溢出,信息就会丢失,因为乘法2等同于移位。使用素数的优势不那么明显,但它是传统的。

A nice property of 31 is that the multiplication can be replaced by a shift (§15.19) and subtraction for better performance:

31日是一个很好的属性,乘法可以被转变为更好的性能(§15.19)和减法:

 31 * i == (i <<5) - i

Modern VMs do this sort of optimization automatically.

现代虚拟机自动执行这种优化。


While the recipe in this item yields reasonably good hash functions, it does not yield state-of-the-art hash functions, nor do Java platform libraries provide such hash functions as of release 1.6. Writing such hash functions is a research topic, best left to mathematicians and theoretical computer scientists.

虽然本项目中的菜谱产生了相当好的散列函数,但它并没有生成最先进的散列函数,Java平台库也没有提供1.6版那样的散列函数。写这样的哈希函数是一个研究课题,最好留给数学家和理论计算机科学家。

Perhaps a later release of the platform will provide state-of-the-art hash functions for its classes and utility methods to allow average programmers to construct such hash functions. In the meantime, the techniques described in this item should be adequate for most applications.

平台的后续版本可能会为它的类和实用方法提供最先进的哈希函数,以允许普通程序员构建此类哈希函数。与此同时,本项目中所描述的技术应该适用于大多数应用程序。

Rather simplistically, it can be said that using a multiplier with numerous divisors will result in more hash collisions. Since for effective hashing we want to minimize the number of collisions, we try to use a multiplier that has fewer divisors. A prime number by definition has exactly two distinct, positive divisors.

简单地说,使用带有多个除数的乘数会导致更多的散列冲突。因为为了有效的散列,我们希望最小化冲突的数量,所以我们尝试使用一个具有更少的因数的乘数。一个素数根据定义有两个截然不同的正因数。

Related questions

  • Java hashCode from one field - the recipe, plus example of using Apache Commons Lang's builders
  • 来自一个字段的Java hashCode -食谱,加上使用Apache Commons Lang的生成器的示例
  • is it incorrect to define an hashcode of an object as the sum, multiplication, whatever, of all class variables hashcodes?
  • 将一个对象的hashcode定义为所有类变量hashcodes的和、乘法、等等,是不正确的吗?
  • Absolute Beginner's Guide to Bit Shifting?
  • 绝对初学者的钻头移位指南?

#4


4  

I heard that 31 was chosen so that the compiler can optimize the multiplication to left-shift 5 bits then subtract the value.

我听说31被选择了,这样编译器就可以将乘法优化为左移5位,然后减去这个值。

#5


2  

Here's a citation a little closer to the source.

这里有一个引文更接近源。

It boils down to:

它可以归结为:

  • 31 is prime, which reduces collisions
  • 31是质数,可以减少碰撞
  • 31 produces a good distribution, with
  • 31产生一个良好的分布,与
  • a reasonable tradeoff in speed
  • 在速度上的合理权衡

#6


2  

First you compute the hash value modulo 2^32 (the size of an int), so you want something relatively prime to 2^32 (relatively prime means that there are no common divisors). Any odd number would do for that.

首先计算出散列值模2 ^ 32(int)的大小,所以你想要相对' 2 ^ 32(互质意味着没有公共因子)。任何奇数都可以。

Then for a given hash table the index is usually computed from the hash value modulo the size of the hash table, so you want something that is relatively prime to the size of the hash table. Often the sizes of hash tables are chosen as prime numbers for that reason. In the case of Java the Sun implementation makes sure that the size is always a power of two, so an odd number would suffice here, too. There is also some additional massaging of the hash keys to limit collisions further.

然后对于给定的哈希表,索引通常由哈希值模块计算哈希表的大小,因此您需要一个相对于哈希表大小的素数。通常由于这个原因,哈希表的大小被选择为素数。在Java的例子中,Sun实现确保大小总是2的乘方,所以在这里一个奇数也足够了。还有一些哈希键的附加修改,以进一步限制冲突。

The bad effect if the hash table and the multiplier had a common factor n could be that in certain circumstances only 1/n entries in the hash table would be used.

如果哈希表和乘数有一个公因式n,那么不好的影响可能是在某些情况下,在哈希表中只使用1/n项。

#7


0  

It generally helps achieve a more even spread of your data among the hash buckets, especially for low-entropy keys.

它通常有助于在hash bucket中实现更均匀的数据分布,特别是对于低熵的键。

#8


0  

31 is also specific to Java HashMap which uses a int as hash data type. Thus the max capacity of 2^32. There is no point in using larger Fermat or Mersenne primes.

31也是特定于Java HashMap的,它使用int作为散列数据类型。因此,最大容量为2 ^ 32。使用更大的Fermat或Mersenne素数是没有意义的。


推荐阅读
  • 深入解析JVM垃圾收集器
    本文基于《深入理解Java虚拟机:JVM高级特性与最佳实践》第二版,详细探讨了JVM中不同类型的垃圾收集器及其工作原理。通过介绍各种垃圾收集器的特性和应用场景,帮助读者更好地理解和优化JVM内存管理。 ... [详细]
  • Docker的安全基准
    nsitionalENhttp:www.w3.orgTRxhtml1DTDxhtml1-transitional.dtd ... [详细]
  • 本文将介绍如何编写一些有趣的VBScript脚本,这些脚本可以在朋友之间进行无害的恶作剧。通过简单的代码示例,帮助您了解VBScript的基本语法和功能。 ... [详细]
  • Explore how Matterverse is redefining the metaverse experience, creating immersive and meaningful virtual environments that foster genuine connections and economic opportunities. ... [详细]
  • Explore a common issue encountered when implementing an OAuth 1.0a API, specifically the inability to encode null objects and how to resolve it. ... [详细]
  • 技术分享:从动态网站提取站点密钥的解决方案
    本文探讨了如何从动态网站中提取站点密钥,特别是针对验证码(reCAPTCHA)的处理方法。通过结合Selenium和requests库,提供了详细的代码示例和优化建议。 ... [详细]
  • 深入理解Java中的volatile、内存屏障与CPU指令
    本文详细探讨了Java中volatile关键字的作用机制,以及其与内存屏障和CPU指令之间的关系。通过具体示例和专业解析,帮助读者更好地理解多线程编程中的同步问题。 ... [详细]
  • 本文详细介绍了Java中org.eclipse.ui.forms.widgets.ExpandableComposite类的addExpansionListener()方法,并提供了多个实际代码示例,帮助开发者更好地理解和使用该方法。这些示例来源于多个知名开源项目,具有很高的参考价值。 ... [详细]
  • 本文深入探讨了 Java 中的 Serializable 接口,解释了其实现机制、用途及注意事项,帮助开发者更好地理解和使用序列化功能。 ... [详细]
  • [论文笔记] Crowdsourcing Translation: Professional Quality from Non-Professionals (ACL, 2011)
    Time:4hoursTimespan:Apr15–May3,2012OmarZaidan,ChrisCallison-Burch:CrowdsourcingTra ... [详细]
  • 本文详细介绍了 GWT 中 PopupPanel 类的 onKeyDownPreview 方法,提供了多个代码示例及应用场景,帮助开发者更好地理解和使用该方法。 ... [详细]
  • 本文介绍了Java并发库中的阻塞队列(BlockingQueue)及其典型应用场景。通过具体实例,展示了如何利用LinkedBlockingQueue实现线程间高效、安全的数据传递,并结合线程池和原子类优化性能。 ... [详细]
  • 1.如何在运行状态查看源代码?查看函数的源代码,我们通常会使用IDE来完成。比如在PyCharm中,你可以Ctrl+鼠标点击进入函数的源代码。那如果没有IDE呢?当我们想使用一个函 ... [详细]
  • 本文详细介绍了Akka中的BackoffSupervisor机制,探讨其在处理持久化失败和Actor重启时的应用。通过具体示例,展示了如何配置和使用BackoffSupervisor以实现更细粒度的异常处理。 ... [详细]
  • 本文详细介绍了Java编程语言中的核心概念和常见面试问题,包括集合类、数据结构、线程处理、Java虚拟机(JVM)、HTTP协议以及Git操作等方面的内容。通过深入分析每个主题,帮助读者更好地理解Java的关键特性和最佳实践。 ... [详细]
author-avatar
hcl春丽
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有