I've seen some interesting claims on SO re Java hashmaps and their O(1) lookup time. Can someone explain why this is so? Unless these hashmaps are vastly different from any of the hashing algorithms I was bought up on, there must always exist a dataset that contains collisions.

我看到了一些关于re - Java hashmap和它们的O(1)查找时间的有趣声明。有人能解释这是为什么吗?除非这些hashmap与我所购买的散列算法有很大的不同,否则必须始终存在包含冲突的数据集。

In which case, the lookup would be O(n) rather than O(1).


Can someone explain whether they are O(1) and, if so, how they achieve this?


15 个解决方案



A particular feature of a HashMap is that unlike, say, balanced trees, its behavior is probabilistic. In these cases its usually most helpful to talk about complexity in terms of the probability of a worst-case event occurring would be. For a hash map, that of course is the case of a collision with respect to how full the map happens to be. A collision is pretty easy to estimate.


pcollision = n / capacity

p碰撞= n /容量。

So a hash map with even a modest number of elements is pretty likely to experience at least one collision. Big O notation allows us to do something more compelling. Observe that for any arbitrary, fixed constant k.


O(n) = O(k * n)

O(n) = O(k * n)

We can use this feature to improve the performance of the hash map. We could instead think about the probability of at most 2 collisions.


pcollision x 2 = (n / capacity)2

p碰撞x 2 = (n /容量)2。

This is much lower. Since the cost of handling one extra collision is irrelevant to Big O performance, we've found a way to improve performance without actually changing the algorithm! We can generalzie this to


pcollision x k = (n / capacity)k

碰撞x k = (n /容量)k。

And now we can disregard some arbitrary number of collisions and end up with vanishingly tiny likelihood of more collisions than we are accounting for. You could get the probability to an arbitrarily tiny level by choosing the correct k, all without altering the actual implementation of the algorithm.


We talk about this by saying that the hash-map has O(1) access with high probability




You seem to mix up worst-case behaviour with average-case (expected) runtime. The former is indeed O(n) for hash tables in general (i.e. not using a perfect hashing) but this is rarely relevant in practice.


Any dependable hash table implementation, coupled with a half decent hash, has a retrieval performance of O(1) with a very small factor (2, in fact) in the expected case, within a very narrow margin of variance.




In Java, HashMap works by using hashCode to locate a bucket. Each bucket is a list of items residing in that bucket. The items are scanned, using equals for comparison. When adding items, the HashMap is resized once a certain load percentage is reached.


So, sometimes it will have to compare against a few items, but generally it's much closer to O(1) than O(n). For practical purposes, that's all you should need to know.




Remember that o(1) does not mean that each lookup only examines a single item - it means that the average number of items checked remains constant w.r.t. the number of items in the container. So if it takes on average 4 comparisons to find an item in a container with 100 items, it should also take an average of 4 comparisons to find an item in a container with 10000 items, and for any other number of items (there's always a bit of variance, especially around the points at which the hash table rehashes, and when there's a very small number of items).

请记住,o(1)并不意味着每个查找只检查单个项目——这意味着检查的项目的平均数量仍然是wr .t.容器中项目的数量。如果找到一个需要平均4比较项与100项一个容器,它还应该采取平均4比较发现容器中的一个项目10000项,和任何其他条目的数量(总有一点差异,特别是在点的哈希表做了重复的工作,当有一个非常小的物品的数量)。

So collisions don't prevent the container from having o(1) operations, as long as the average number of keys per bucket remains within a fixed bound.




I know this is an old question, but there's actually a new answer to it.


You're right that a hash map isn't really O(1), strictly speaking, because as the number of elements gets arbitrarily large, eventually you will not be able to search in constant time (and O-notation is defined in terms of numbers that can get arbitrarily large).


But it doesn't follow that the real time complexity is O(n)--because there's no rule that says that the buckets have to be implemented as a linear list.


In fact, Java 8 implements the buckets as TreeMaps once they exceed a threshold, which makes the actual time O(log n).

实际上,Java 8在超过阈值时实现了bucket作为TreeMaps,这使得实际的时间为O(log n)。



If the number of buckets (call it b) is held constant (the usual case), then lookup is actually O(n).
As n gets large, the number of elements in each bucket averages n/b. If collision resolution is done in one of the usual ways (linked list for example), then lookup is O(n/b) = O(n).

如果bucket(称为b)的数量保持不变(通常情况下),那么查找实际上是O(n)。当n变大时,每个桶中元素的数目平均为n/b。如果以通常的方式(例如链接列表)完成冲突解决,那么查找就是O(n/b) = O(n)。

The O notation is about what happens when n gets larger and larger. It can be misleading when applied to certain algorithms, and hash tables are a case in point. We choose the number of buckets based on how many elements we're expecting to deal with. When n is about the same size as b, then lookup is roughly constant-time, but we can't call it O(1) because O is defined in terms of a limit as n → ∞.




O(1+n/k) where k is the number of buckets.

O(1+n/k) k是桶数。

If implementation sets k = n/alpha then it is O(1+alpha) = O(1) since alpha is a constant.

如果实现设置k = n/,那么它就是O(1+alpha) = O(1)因为它是一个常数。



We've established that the standard description of hash table lookups being O(1) refers to the average-case expected time, not the strict worst-case performance. For a hash table resolving collisions with chaining (like Java's hashmap) this is technically O(1+α) with a good hash function, where α is the table's load factor. Still constant as long as the number of objects you're storing is no more than a constant factor larger than the table size.

我们已经确定了哈希表查找的标准描述是O(1)是指预期时间,而不是严格的最坏情况。哈希表解决冲突的链接技术(如Java hashmap)这是O(1 +α)与一个好的哈希函数,其中α是表的负荷因子。只要存储的对象的数量不超过表大小的常量,就可以保持不变。

It's also been explained that strictly speaking it's possible to construct input that requires O(n) lookups for any deterministic hash function. But it's also interesting to consider the worst-case expected time, which is different than average search time. Using chaining this is O(1 + the length of the longest chain), for example Θ(log n / log log n) when α=1.

也有人解释说,严格地说,可以构造需要O(n)查找任何确定性哈希函数的输入。但是考虑最坏的预期时间也很有趣,这与平均搜索时间不同。使用链接这是O(1 +最长的碳链的长度),例如Θ(O(log n)/日志O(log n))当α= 1。

If you're interested in theoretical ways to achieve constant time expected worst-case lookups, you can read about dynamic perfect hashing which resolves collisions recursively with another hash table!




It is O(1) only if your hashing function is very good. The Java hash table implementation does not protect against bad hash functions.


Whether you need to grow the table when you add items or not is not relevant to the question because it is about lookup time.




This basically goes for most hash table implementations in most programming languages, as the algorithm itself doesn't really change.


If there are no collisions present in the table, you only have to do a single look-up, therefore the running time is O(1). If there are collisions present, you have to do more than one look-up, which drives down the performance towards O(n).




It depends on the algorithm you choose to avoid collisions. If your implementation uses separate chaining then the worst case scenario happens where every data element is hashed to the same value (poor choice of the hash function for example). In that case, data lookup is no different from a linear search on a linked list i.e. O(n). However, the probability of that happening is negligible and lookups best and average cases remain constant i.e. O(1).




Academics aside, from a practical perspective, HashMaps should be accepted as having an inconsequential performance impact (unless your profiler tells you otherwise.)




Only in theoretical case, when hashcodes are always different and bucket for every hash code is also different, the O(1) will exist. Otherwise, it is of constant order i.e. on increment of hashmap, its order of search remains constant.




Elements inside the HashMap are stored as an array of linked list (node), each linked list in the array represent a bucket for unique hash value of one or more keys.
While adding an entry in the HashMap, the hashcode of the key is used to determine the location of the bucket in the array, something like:


location = (arraylength - 1) & keyhashcode

Here the & represents bitwise AND operator.


For example: 100 & "ABC".hashCode() = 64 (location of the bucket for the key "ABC")

例如:100 & "ABC".hashCode() = 64 (key "ABC"的桶的位置)

During get operation it uses same way to determine the location of bucket for the key. Under the best case each hashcode is unique and results in a unique bucket for each key, in this case the get method spends time only to determine the bucket location and retrieving the value which is constant O(1).


Under the worst case, all the keys have same hashcode and stored in same bucket, this results in traversing through the entire list which leads to O(n).


In the case of java 8, the Linked List bucket is replaced with a TreeMap if the size grows to more than 8, this reduces the worst case search efficiency to O(log n).

在java 8的情况下,如果大小增长到大于8,则将链表bucket替换为TreeMap,这将减少最坏的情况搜索效率(log n)。



Of course the performance of the hashmap will depend based on the quality of the hashCode() function for the given object. However, if the function is implemented such that the possibility of collisions is very low, it will have a very good performance (this is not strictly O(1) in every possible case but it is in most cases).


For example the default implementation in the Oracle JRE is to use a random number (which is stored in the object instance so that it doesn't change - but it also disables biased locking, but that's an other discussion) so the chance of collisions is very low.

例如,Oracle JRE中的默认实现是使用一个随机数(该随机数存储在对象实例中,这样它就不会改变——但它也禁用了偏压锁定,但这是另一个讨论),因此冲突的几率非常低。

