热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

如果哈希码不同,为什么HashSet允许相等的项?-WhydoesHashSetallowequalitemsifhashcodesaredifferent?

TheHashSetclasshasanadd(Objecto)method,whichisnotinheritedfromanotherclass.TheJavado

The HashSet class has an add(Object o) method, which is not inherited from another class. The Javadoc for that method says the following:

HashSet类有一个add(Object o)方法,该方法不是从另一个类继承的。该方法的Javadoc说明如下:

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

如果指定的元素尚不存在,则将其添加到此集合中。更正式地,如果此集合不包含元素e2(e == null?e2 == null:e.equals(e2)),则将指定元素e添加到此集合。如果此set已包含该元素,则调用将保持set不变并返回false。

In other words, if two objects are equal, then the second object will not be added and the HashSet will remain the same. However, I've discovered that this is not true if objects e and e2 have different hashcodes, despite the fact that e.equals(e2). Here is a simple example:

换句话说,如果两个对象相等,则不会添加第二个对象,并且HashSet将保持不变。但是,我发现如果对象e和e2具有不同的哈希码,则不是这样,尽管事实上是e.equals(e2)。这是一个简单的例子:

import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;

public class BadHashCodeClass {

    /**
     * A hashcode that will randomly return an integer, so it is unlikely to be the same
     */
    @Override
    public int hashCode(){
        return new Random().nextInt();
    }

    /**
     * An equal method that will always return true
     */
    @Override
    public boolean equals(Object o){
        return true;
    }

    public static void main(String... args){
        HashSet hashSet = new HashSet<>();
        BadHashCodeClass instance = new BadHashCodeClass();
        System.out.println("Instance was added: " + hashSet.add(instance));
        System.out.println("Instance was added: " + hashSet.add(instance));
        System.out.println("Elements in hashSet: " + hashSet.size());

        Iterator iterator = hashSet.iterator();
        BadHashCodeClass e = iterator.next();
        BadHashCodeClass e2 = iterator.next();
        System.out.println("Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): " + (e==null ? e2==null : e.equals(e2)));
    }

The results from the main method are:

主要方法的结果是:

Instance was added: true
Instance was added: true
Elements in hashSet: 2
Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): true

As the example above clearly shows, HashSet was able to add two elements where e.equals(e2).

如上面的例子清楚地显示,HashSet能够在e.equals(e2)中添加两个元素。

I'm going to assume that this is not a bug in Java and that there is in fact some perfectly rational explanation for why this is. But I can't figure out what exactly. What am I missing?

我将假设这不是Java中的错误,实际上有一些完全合理的解释为什么会这样。但我无法弄清楚到底是什么。我错过了什么?

5 个解决方案

#1


12  

I think what you're really trying to ask is:

我想你真正要问的是:

"Why does a HashSet add objects with inequal hash codes even if they claim to be equal?"

“为什么HashSet添加具有不等哈希码的对象,即使它们声称是相等的?”

The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question.

我的问题和你发布的问题之间的区别在于你假设这种行为是一个错误,因此从这个角度来看你会感到悲伤。我认为其他海报已经完全解释了为什么这不是一个错误,但是他们没有解决潜在的问题。

I will try to do so here; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing.

我会在这里尝试这样做;我建议改写你的问题,以消除Java中糟糕的文档/错误的指控,这样你就可以更直接地探究为什么你遇到了你所看到的行为。


The equals() documentations states (emphasis added):

equals()文档说明(强调添加):

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

请注意,通常需要在重写此方法时覆盖hashCode方法,以便维护hashCode方法的常规协定,该方法声明相等的对象必须具有相等的哈希代码。

The contract between equals() and hashCode() isn't just an annoying quirk in the Java specification. It provides some very valuable benefits in terms of algorithm optimization. By being able to assume that a.equals(b) implies a.hashCode() == b.hashCode() we can do some basic equivalence tests without needing to call equals() directly. In particular, the invariant above can be turned around - a.hashCode() != b.hashCode() implies a.equals(b) will be false.

equals()和hashCode()之间的契约不仅仅是Java规范中令人讨厌的怪癖。它在算法优化方面提供了一些非常有价值的好处。通过假设a.equals(b)暗示a.hashCode()== b.hashCode(),我们可以做一些基本的等价测试而无需直接调用equals()。特别是,上面的不变量可以转换 - a.hashCode()!= b.hashCode()意味着a.equals(b)将为false。

If you look at the code for HashMap (which HashSet uses internally), you'll notice an inner static class Entry, defined like so:

如果你看一下HashMap的代码(HashSet在内部使用),你会注意到一个内部静态类Entry,定义如下:

static class Entry implements Map.Entry {
  final K key;
  V value;
  Entry next;
  int hash;
  ...
}

HashMap stores the key's hash code along with the key and value. Because a hash code is expected to not change over the time a key is stored in the map (see Map's documentation, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.") it is safe for HashMap to cache this value. By doing so, it only needs to call hashCode() once for each key in the map, as opposed to every time the key is inspected.

HashMap存储密钥的哈希码以及密钥和值。由于哈希码在密钥存储在地图中的时间内不会发生变化(请参阅Map的文档,“如果对象的值以影响等于比较的方式更改,则不会指定地图的行为。 object是map中的一个键。“)HashMap可以安全地缓存这个值。通过这样做,它只需要为地图中的每个键调用一次hashCode(),而不是每次检查键时。

Now lets look at the implementation of put(), where we see these cached hashes being taken advantage of, along with the invariant above:

现在让我们看一下put()的实现,我们看到这些缓存的哈希被利用,以及上面的不变量:

public V put(K key, V value) {
  ...
  int hash = hash(key);
  int i = indexFor(hash, table.length);
  for (Entry e = table[i]; e != null; e = e.next) {
    Object k;
    if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
      // Replace existing element and return
    }
  }
  // Insert new element
}

In particular, notice that the conditional only ever calls key.equals(k) if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation. By the contract of these methods, it should be safe for HashMap to skip this call. If your objects are incorrectly implemented, these assumptions being made by HashMap are no longer true, and you will get back unusable results, including "duplicates" in your set.

特别注意,由于短路评估,如果哈希码相等且密钥不是完全相同的对象,则条件只调用key.equals(k)。通过这些方法的契约,HashMap跳过此调用应该是安全的。如果你的对象被错误地实现,那么HashMap做出的这些假设就不再正确,你将得到不可用的结果,包括你的集合中的“重复”。


Note that your claim "HashSet ... has an add(Object o) method, which is not inherited from another class" is not quite correct. While its parent class, AbstractSet, does not implement this method, the parent interface, Set, does specify the method's contract. The Set interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2)). As long as you follow the contracts, HashSet works as documented, but avoids actually doing wasteful work whenever possible. As soon as you break the rules however, HashSet cannot be expected to behave in any useful way.

请注意,您的声明“HashSet ...有一个add(Object o)方法,它不是从另一个类继承的”,这是不正确的。虽然它的父类AbstractSet没有实现此方法,但父接口Set确实指定了方法的契约。 Set接口不关心哈希,只关注相等,因此它根据等式(e == null?e2 == null:e.equals(e2))指定此方法的行为。只要您遵循合同,HashSet就会按照文档记录,但尽可能避免实际浪费的工作。但是,一旦违反规则,就不能指望HashSet以任何有用的方式运行。

Consider also that if you attempted to store objects in a TreeSet with an incorrectly implemented Comparator, you would similarly see nonsensical results. I documented some examples of how a TreeSet behaves when using an untrustworthy Comparator in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet?

还要考虑如果您尝试使用错误实现的Comparator将对象存储在TreeSet中,您同样会看到无意义的结果。我记录了一些TreeSet在另一个问题中使用不可靠的Comparator时的行为示例:如何在Java中为StringBuffer类实现比较器以便在TreeSet中使用?

#2


8  

You've violated the contract of equals/hashCode basically:

你基本违反了equals / hashCode的契约:

From the hashCode() docs:

来自hashCode()文档:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

如果两个对象根据equals(Object)方法相等,则对两个对象中的每一个调用hashCode方法必须生成相同的整数结果。

and from equals:

从平等:

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

请注意,通常需要在重写此方法时覆盖hashCode方法,以便维护hashCode方法的常规协定,该方法声明相等的对象必须具有相等的哈希代码。

HashSet relies on equals and hashCode being implemented consistently - the Hash part of the name HashSet basically implies "This class uses hashCode for efficiency purposes." If the two methods are not implemented consistently, all bets are off.

HashSet依赖于equals和hashCode一致地实现 - 名称HashSet的Hash部分基本上暗示“此类使用hashCode来提高效率”。如果两种方法没有一致地实施,那么所有的赌注都是关闭的。

This shouldn't happen in real code, because you shouldn't be violating the contract in real code...

这不应该在实际代码中发生,因为你不应该违反实际代码中的合同......

#3


2  

@Override
public int hashCode(){
    return new Random().nextInt();
}

You are returning different has codes for same object every time it is evaluated. Obviously you will get wrong results.

每次评估时,您都会返回相同对象的不同代码。显然你会得到错误的结果。


add() function is as follows

add()函数如下

public boolean add(E e) {
return map.put(e, PRESENT)==null;
}

and put() is

和put()是

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for (Entry e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

If you notice first has is calculated which is different in your case which is why object is added. equals() comes into picture only if hash are same for objects i.e collision has occured. Since in case hash are different equals() is never executed

如果您注意到第一个已计算出的情况与您的情况不同,这就是添加对象的原因。只有当对象的哈希值相同,即碰撞发生时,equals()才会出现。由于哈希值不同,因此从不执行equals()

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

Read more on what short circuiting is. since e.hash == hash is false nothing else is evaluated.

阅读有关短路的更多信息。由于e.hash == hash为false,因此不会评估任何其他内容。

I hope this helps.

我希望这有帮助。

#4


0  

because hashcode() is really implemented very badly,

因为hashcode()实际上非常糟糕,

it will try to equate in each random bucket on each add(), if you return constant value from hashcode() it wouldn't let you enter any

它将尝试在每个add()上的每个随机桶中等同,如果从hashcode()返回常量值,它将不允许您输入任何

#5


0  

It is not required that hash codes be different for all elements! It is only required that two elements are not equal.

并不要求所有元素的哈希码都不同!只需要两个元素不相等。

HashCode is used first to find the hash bucket the object should occupy. If hadhcodes are different, objects are assumed to be not equal. If hashcodes are equal, then the equals() method is used to determine equality. The use of hashCode is an efficiency mechanism.

首先使用HashCode来查找对象应占用的哈希桶。如果hadhcodes不同,则假定对象不相等。如果哈希码相等,则使用equals()方法确定相等性。 hashCode的使用是一种效率机制。

And...
Your hash code implementation violates the contract that it should not change unless the objects identifying fields change.

并且...您的哈希代码实现违反了它不应该更改的合同,除非标识字段的对象发生更改。


推荐阅读
author-avatar
LordHo
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有