热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Javahashmap真的是O(1)吗?-IsaJavahashmapreallyO(1)?

IveseensomeinterestingclaimsonSOreJavahashmapsandtheirO(1)lookuptime.Cansomeoneexp

I've seen some interesting claims on SO re Java hashmaps and their O(1) lookup time. Can someone explain why this is so? Unless these hashmaps are vastly different from any of the hashing algorithms I was bought up on, there must always exist a dataset that contains collisions.

我看到了一些关于re - Java hashmap和它们的O(1)查找时间的有趣声明。有人能解释这是为什么吗?除非这些hashmap与我所购买的散列算法有很大的不同,否则必须始终存在包含冲突的数据集。

In which case, the lookup would be O(n) rather than O(1).

在这种情况下,查找将是O(n)而不是O(1)。

Can someone explain whether they are O(1) and, if so, how they achieve this?

有人能解释他们是O(1)吗?如果是的话,他们是怎么做到的?

15 个解决方案

#1


106  

A particular feature of a HashMap is that unlike, say, balanced trees, its behavior is probabilistic. In these cases its usually most helpful to talk about complexity in terms of the probability of a worst-case event occurring would be. For a hash map, that of course is the case of a collision with respect to how full the map happens to be. A collision is pretty easy to estimate.

HashMap的一个特殊特性是,与平衡树不同,它的行为是概率性的。在这些情况下,通常最有助于讨论最坏情况发生的可能性的复杂性。对于一个哈希映射,这当然是与地图的完整程度发生碰撞的情况。碰撞很容易估计。

pcollision = n / capacity

p碰撞= n /容量。

So a hash map with even a modest number of elements is pretty likely to experience at least one collision. Big O notation allows us to do something more compelling. Observe that for any arbitrary, fixed constant k.

因此,一个包含少量元素的哈希映射很可能至少会经历一次碰撞。大O符号可以让我们做一些更有说服力的事情。观察任意的固定常数k。

O(n) = O(k * n)

O(n) = O(k * n)

We can use this feature to improve the performance of the hash map. We could instead think about the probability of at most 2 collisions.

我们可以使用这个特性来提高散列映射的性能。我们可以考虑最多两个碰撞的概率。

pcollision x 2 = (n / capacity)2

p碰撞x 2 = (n /容量)2。

This is much lower. Since the cost of handling one extra collision is irrelevant to Big O performance, we've found a way to improve performance without actually changing the algorithm! We can generalzie this to

这是低得多。由于处理一个额外的冲突的成本与大O性能无关,我们已经找到了一种方法来提高性能,而不需要实际改变算法!我们可以把这个一般化。

pcollision x k = (n / capacity)k

碰撞x k = (n /容量)k。

And now we can disregard some arbitrary number of collisions and end up with vanishingly tiny likelihood of more collisions than we are accounting for. You could get the probability to an arbitrarily tiny level by choosing the correct k, all without altering the actual implementation of the algorithm.

现在我们可以忽略任意数量的碰撞最后的碰撞概率比我们计算的要小得多。你可以通过选择正确的k来得到任意微小的概率,而不需要改变算法的实际实现。

We talk about this by saying that the hash-map has O(1) access with high probability

我们说这个hashmap有高概率的O(1)访问。

#2


34  

You seem to mix up worst-case behaviour with average-case (expected) runtime. The former is indeed O(n) for hash tables in general (i.e. not using a perfect hashing) but this is rarely relevant in practice.

您似乎将最坏情况与平均情况(预期)运行时混淆。前者在一般情况下确实是O(n)的哈希表(即不使用完美的哈希表),但在实践中很少相关。

Any dependable hash table implementation, coupled with a half decent hash, has a retrieval performance of O(1) with a very small factor (2, in fact) in the expected case, within a very narrow margin of variance.

任何可靠的哈希表实现,加上一个半像样的散列,都有一个O(1)的检索性能,在预期的情况下,一个很小的因素(实际上是2),在一个非常狭窄的方差范围内。

#3


26  

In Java, HashMap works by using hashCode to locate a bucket. Each bucket is a list of items residing in that bucket. The items are scanned, using equals for comparison. When adding items, the HashMap is resized once a certain load percentage is reached.

在Java中,HashMap使用hashCode来定位一个bucket。每个bucket都是存储在该bucket中的项的列表。这些条目被扫描,使用equals进行比较。当添加项目时,HashMap会在到达一定的负载百分比时调整大小。

So, sometimes it will have to compare against a few items, but generally it's much closer to O(1) than O(n). For practical purposes, that's all you should need to know.

因此,有时它必须与一些项目进行比较,但通常情况下,它比O(1)更接近于O(1)。实际上,这就是你需要知道的。

#4


23  

Remember that o(1) does not mean that each lookup only examines a single item - it means that the average number of items checked remains constant w.r.t. the number of items in the container. So if it takes on average 4 comparisons to find an item in a container with 100 items, it should also take an average of 4 comparisons to find an item in a container with 10000 items, and for any other number of items (there's always a bit of variance, especially around the points at which the hash table rehashes, and when there's a very small number of items).

请记住,o(1)并不意味着每个查找只检查单个项目——这意味着检查的项目的平均数量仍然是wr .t.容器中项目的数量。如果找到一个需要平均4比较项与100项一个容器,它还应该采取平均4比较发现容器中的一个项目10000项,和任何其他条目的数量(总有一点差异,特别是在点的哈希表做了重复的工作,当有一个非常小的物品的数量)。

So collisions don't prevent the container from having o(1) operations, as long as the average number of keys per bucket remains within a fixed bound.

因此,只要每个桶的平均键数保持在一个固定的范围内,冲突就不会阻止容器拥有o(1)操作。

#5


8  

I know this is an old question, but there's actually a new answer to it.

我知道这是个老问题,但实际上有一个新的答案。

You're right that a hash map isn't really O(1), strictly speaking, because as the number of elements gets arbitrarily large, eventually you will not be able to search in constant time (and O-notation is defined in terms of numbers that can get arbitrarily large).

你是对的,一个哈希映射并不是真正的O(1),严格地说,因为当元素的数量变得任意大时,最终你将无法在常量时间内搜索(并且O符号是根据可以任意大的数字来定义的)。

But it doesn't follow that the real time complexity is O(n)--because there's no rule that says that the buckets have to be implemented as a linear list.

但它并没有遵循真正的时间复杂度是O(n)——因为没有规则说这些桶必须被实现为一个线性列表。

In fact, Java 8 implements the buckets as TreeMaps once they exceed a threshold, which makes the actual time O(log n).

实际上,Java 8在超过阈值时实现了bucket作为TreeMaps,这使得实际的时间为O(log n)。

#6


4  

If the number of buckets (call it b) is held constant (the usual case), then lookup is actually O(n).
As n gets large, the number of elements in each bucket averages n/b. If collision resolution is done in one of the usual ways (linked list for example), then lookup is O(n/b) = O(n).

如果bucket(称为b)的数量保持不变(通常情况下),那么查找实际上是O(n)。当n变大时,每个桶中元素的数目平均为n/b。如果以通常的方式(例如链接列表)完成冲突解决,那么查找就是O(n/b) = O(n)。

The O notation is about what happens when n gets larger and larger. It can be misleading when applied to certain algorithms, and hash tables are a case in point. We choose the number of buckets based on how many elements we're expecting to deal with. When n is about the same size as b, then lookup is roughly constant-time, but we can't call it O(1) because O is defined in terms of a limit as n → ∞.

O符号指的是当n变得越来越大时会发生什么。当应用到某些算法时,它可能会产生误导,而哈希表就是一个很好的例子。我们根据期望处理的元素数量来选择桶数。当n是同样的大小b,然后查找大概是常量时间,但是我们不能称它为O(1)因为O的极限定义n→∞。

#7


4  

O(1+n/k) where k is the number of buckets.

O(1+n/k) k是桶数。

If implementation sets k = n/alpha then it is O(1+alpha) = O(1) since alpha is a constant.

如果实现设置k = n/,那么它就是O(1+alpha) = O(1)因为它是一个常数。

#8


2  

We've established that the standard description of hash table lookups being O(1) refers to the average-case expected time, not the strict worst-case performance. For a hash table resolving collisions with chaining (like Java's hashmap) this is technically O(1+α) with a good hash function, where α is the table's load factor. Still constant as long as the number of objects you're storing is no more than a constant factor larger than the table size.

我们已经确定了哈希表查找的标准描述是O(1)是指预期时间,而不是严格的最坏情况。哈希表解决冲突的链接技术(如Java hashmap)这是O(1 +α)与一个好的哈希函数,其中α是表的负荷因子。只要存储的对象的数量不超过表大小的常量,就可以保持不变。

It's also been explained that strictly speaking it's possible to construct input that requires O(n) lookups for any deterministic hash function. But it's also interesting to consider the worst-case expected time, which is different than average search time. Using chaining this is O(1 + the length of the longest chain), for example Θ(log n / log log n) when α=1.

也有人解释说,严格地说,可以构造需要O(n)查找任何确定性哈希函数的输入。但是考虑最坏的预期时间也很有趣,这与平均搜索时间不同。使用链接这是O(1 +最长的碳链的长度),例如Θ(O(log n)/日志O(log n))当α= 1。

If you're interested in theoretical ways to achieve constant time expected worst-case lookups, you can read about dynamic perfect hashing which resolves collisions recursively with another hash table!

如果您感兴趣的是实现持续时间预期的最坏情况查找的理论方法,您可以阅读关于动态完美散列的内容,它可以通过另一个散列表来递归地解决冲突!

#9


2  

It is O(1) only if your hashing function is very good. The Java hash table implementation does not protect against bad hash functions.

只有当你的哈希函数很好时,才会是O(1)。Java哈希表实现不保护坏的哈希函数。

Whether you need to grow the table when you add items or not is not relevant to the question because it is about lookup time.

是否需要在添加项目时增加表,这与问题无关,因为它是关于查找时间的。

#10


1  

This basically goes for most hash table implementations in most programming languages, as the algorithm itself doesn't really change.

这基本上适用于大多数编程语言中的大多数哈希表实现,因为算法本身并没有真正改变。

If there are no collisions present in the table, you only have to do a single look-up, therefore the running time is O(1). If there are collisions present, you have to do more than one look-up, which drives down the performance towards O(n).

如果表中没有冲突,您只需要进行一次查找,因此运行时间是O(1)。如果存在冲突,则必须执行多个查找,从而将性能降低到O(n)。

#11


1  

It depends on the algorithm you choose to avoid collisions. If your implementation uses separate chaining then the worst case scenario happens where every data element is hashed to the same value (poor choice of the hash function for example). In that case, data lookup is no different from a linear search on a linked list i.e. O(n). However, the probability of that happening is negligible and lookups best and average cases remain constant i.e. O(1).

这取决于你选择避免碰撞的算法。如果您的实现使用单独的链接,那么最坏的情况会发生在每个数据元素都被散列到相同的值(例如,糟糕的哈希函数的选择)。在这种情况下,数据查找与链表上的线性搜索没有区别,即O(n)。然而,发生这种情况的概率是可以忽略不计的,而且最优和平均情况保持不变,即O(1)。

#12


1  

Academics aside, from a practical perspective, HashMaps should be accepted as having an inconsequential performance impact (unless your profiler tells you otherwise.)

撇开学术不谈,从实用的角度来看,hashmap应该被接受为具有无关紧要的性能影响(除非您的剖析器告诉您其他情况)。

#13


1  

Only in theoretical case, when hashcodes are always different and bucket for every hash code is also different, the O(1) will exist. Otherwise, it is of constant order i.e. on increment of hashmap, its order of search remains constant.

只有在理论的情况下,当hashcode总是不同的时候,每个hash码的bucket都是不同的,O(1)就会存在。否则,在hashmap的增量上,它的搜索顺序是不变的。

#14


1  

Elements inside the HashMap are stored as an array of linked list (node), each linked list in the array represent a bucket for unique hash value of one or more keys.
While adding an entry in the HashMap, the hashcode of the key is used to determine the location of the bucket in the array, something like:

HashMap中的元素存储为链表(节点)数组,数组中的每个链表表示一个bucket,用于一个或多个键的唯一哈希值。在HashMap中添加条目时,键的hashcode用于确定数组中bucket的位置,比如:

location = (arraylength - 1) & keyhashcode

Here the & represents bitwise AND operator.

这里的&表示位和运算符。

For example: 100 & "ABC".hashCode() = 64 (location of the bucket for the key "ABC")

例如:100 & "ABC".hashCode() = 64 (key "ABC"的桶的位置)

During get operation it uses same way to determine the location of bucket for the key. Under the best case each hashcode is unique and results in a unique bucket for each key, in this case the get method spends time only to determine the bucket location and retrieving the value which is constant O(1).

在获取操作过程中,它使用相同的方法来确定密钥桶的位置。在最好的情况下,每个hashcode都是唯一的,并且每个键都有一个唯一的bucket,在这种情况下,get方法只花费时间来确定bucket的位置,并检索常量O(1)的值。

Under the worst case, all the keys have same hashcode and stored in same bucket, this results in traversing through the entire list which leads to O(n).

在最坏的情况下,所有的键都有相同的hashcode并存储在同一个bucket中,这将导致遍历整个列表,从而导致O(n)。

In the case of java 8, the Linked List bucket is replaced with a TreeMap if the size grows to more than 8, this reduces the worst case search efficiency to O(log n).

在java 8的情况下,如果大小增长到大于8,则将链表bucket替换为TreeMap,这将减少最坏的情况搜索效率(log n)。

#15


0  

Of course the performance of the hashmap will depend based on the quality of the hashCode() function for the given object. However, if the function is implemented such that the possibility of collisions is very low, it will have a very good performance (this is not strictly O(1) in every possible case but it is in most cases).

当然,hashmap的性能取决于给定对象的hashCode()函数的质量。然而,如果函数被实现,冲突的可能性非常低,它将有一个非常好的性能(在每个可能的情况下,这并不是严格的O(1),但是在大多数情况下是这样的)。

For example the default implementation in the Oracle JRE is to use a random number (which is stored in the object instance so that it doesn't change - but it also disables biased locking, but that's an other discussion) so the chance of collisions is very low.

例如,Oracle JRE中的默认实现是使用一个随机数(该随机数存储在对象实例中,这样它就不会改变——但它也禁用了偏压锁定,但这是另一个讨论),因此冲突的几率非常低。


推荐阅读
  • JDK源码学习之HashTable(附带面试题)的学习笔记
    本文介绍了JDK源码学习之HashTable(附带面试题)的学习笔记,包括HashTable的定义、数据类型、与HashMap的关系和区别。文章提供了干货,并附带了其他相关主题的学习笔记。 ... [详细]
  • Java容器中的compareto方法排序原理解析
    本文从源码解析Java容器中的compareto方法的排序原理,讲解了在使用数组存储数据时的限制以及存储效率的问题。同时提到了Redis的五大数据结构和list、set等知识点,回忆了作者大学时代的Java学习经历。文章以作者做的思维导图作为目录,展示了整个讲解过程。 ... [详细]
  • Spring特性实现接口多类的动态调用详解
    本文详细介绍了如何使用Spring特性实现接口多类的动态调用。通过对Spring IoC容器的基础类BeanFactory和ApplicationContext的介绍,以及getBeansOfType方法的应用,解决了在实际工作中遇到的接口及多个实现类的问题。同时,文章还提到了SPI使用的不便之处,并介绍了借助ApplicationContext实现需求的方法。阅读本文,你将了解到Spring特性的实现原理和实际应用方式。 ... [详细]
  • 推荐系统遇上深度学习(十七)详解推荐系统中的常用评测指标
    原创:石晓文小小挖掘机2018-06-18笔者是一个痴迷于挖掘数据中的价值的学习人,希望在平日的工作学习中,挖掘数据的价值, ... [详细]
  • 个人学习使用:谨慎参考1Client类importcom.thoughtworks.gauge.Step;importcom.thoughtworks.gauge.T ... [详细]
  • 标题: ... [详细]
  • 本文讨论了Kotlin中扩展函数的一些惯用用法以及其合理性。作者认为在某些情况下,定义扩展函数没有意义,但官方的编码约定支持这种方式。文章还介绍了在类之外定义扩展函数的具体用法,并讨论了避免使用扩展函数的边缘情况。作者提出了对于扩展函数的合理性的质疑,并给出了自己的反驳。最后,文章强调了在编写Kotlin代码时可以自由地使用扩展函数的重要性。 ... [详细]
  • 模板引擎StringTemplate的使用方法和特点
    本文介绍了模板引擎StringTemplate的使用方法和特点,包括强制Model和View的分离、Lazy-Evaluation、Recursive enable等。同时,还介绍了StringTemplate语法中的属性和普通字符的使用方法,并提供了向模板填充属性的示例代码。 ... [详细]
  • Java程序设计第4周学习总结及注释应用的开发笔记
    本文由编程笔记#小编为大家整理,主要介绍了201521123087《Java程序设计》第4周学习总结相关的知识,包括注释的应用和使用类的注释与方法的注释进行注释的方法,并在Eclipse中查看。摘要内容大约为150字,提供了一定的参考价值。 ... [详细]
  • 欢乐的票圈重构之旅——RecyclerView的头尾布局增加
    项目重构的Git地址:https:github.comrazerdpFriendCircletreemain-dev项目同步更新的文集:http:www.jianshu.comno ... [详细]
  • 本文整理了Java面试中常见的问题及相关概念的解析,包括HashMap中为什么重写equals还要重写hashcode、map的分类和常见情况、final关键字的用法、Synchronized和lock的区别、volatile的介绍、Syncronized锁的作用、构造函数和构造函数重载的概念、方法覆盖和方法重载的区别、反射获取和设置对象私有字段的值的方法、通过反射创建对象的方式以及内部类的详解。 ... [详细]
  • Spring源码解密之默认标签的解析方式分析
    本文分析了Spring源码解密中默认标签的解析方式。通过对命名空间的判断,区分默认命名空间和自定义命名空间,并采用不同的解析方式。其中,bean标签的解析最为复杂和重要。 ... [详细]
  • 向QTextEdit拖放文件的方法及实现步骤
    本文介绍了在使用QTextEdit时如何实现拖放文件的功能,包括相关的方法和实现步骤。通过重写dragEnterEvent和dropEvent函数,并结合QMimeData和QUrl等类,可以轻松实现向QTextEdit拖放文件的功能。详细的代码实现和说明可以参考本文提供的示例代码。 ... [详细]
  • 本文讨论了在VMWARE5.1的虚拟服务器Windows Server 2008R2上安装oracle 10g客户端时出现的问题,并提供了解决方法。错误日志显示了异常访问违例,通过分析日志中的问题帧,找到了解决问题的线索。文章详细介绍了解决方法,帮助读者顺利安装oracle 10g客户端。 ... [详细]
  • 本文讨论了微软的STL容器类是否线程安全。根据MSDN的回答,STL容器类包括vector、deque、list、queue、stack、priority_queue、valarray、map、hash_map、multimap、hash_multimap、set、hash_set、multiset、hash_multiset、basic_string和bitset。对于单个对象来说,多个线程同时读取是安全的。但如果一个线程正在写入一个对象,那么所有的读写操作都需要进行同步。 ... [详细]
author-avatar
羚瑞聪羊奶粉
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有