热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

如果哈希码不同,为什么HashSet允许相等的项?-WhydoesHashSetallowequalitemsifhashcodesaredifferent?

TheHashSetclasshasanadd(Objecto)method,whichisnotinheritedfromanotherclass.TheJavado

The HashSet class has an add(Object o) method, which is not inherited from another class. The Javadoc for that method says the following:

HashSet类有一个add(Object o)方法,该方法不是从另一个类继承的。该方法的Javadoc说明如下:

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

如果指定的元素尚不存在,则将其添加到此集合中。更正式地,如果此集合不包含元素e2(e == null?e2 == null:e.equals(e2)),则将指定元素e添加到此集合。如果此set已包含该元素,则调用将保持set不变并返回false。

In other words, if two objects are equal, then the second object will not be added and the HashSet will remain the same. However, I've discovered that this is not true if objects e and e2 have different hashcodes, despite the fact that e.equals(e2). Here is a simple example:

换句话说,如果两个对象相等,则不会添加第二个对象,并且HashSet将保持不变。但是,我发现如果对象e和e2具有不同的哈希码,则不是这样,尽管事实上是e.equals(e2)。这是一个简单的例子:

import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;

public class BadHashCodeClass {

    /**
     * A hashcode that will randomly return an integer, so it is unlikely to be the same
     */
    @Override
    public int hashCode(){
        return new Random().nextInt();
    }

    /**
     * An equal method that will always return true
     */
    @Override
    public boolean equals(Object o){
        return true;
    }

    public static void main(String... args){
        HashSet hashSet = new HashSet<>();
        BadHashCodeClass instance = new BadHashCodeClass();
        System.out.println("Instance was added: " + hashSet.add(instance));
        System.out.println("Instance was added: " + hashSet.add(instance));
        System.out.println("Elements in hashSet: " + hashSet.size());

        Iterator iterator = hashSet.iterator();
        BadHashCodeClass e = iterator.next();
        BadHashCodeClass e2 = iterator.next();
        System.out.println("Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): " + (e==null ? e2==null : e.equals(e2)));
    }

The results from the main method are:

主要方法的结果是:

Instance was added: true
Instance was added: true
Elements in hashSet: 2
Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): true

As the example above clearly shows, HashSet was able to add two elements where e.equals(e2).

如上面的例子清楚地显示,HashSet能够在e.equals(e2)中添加两个元素。

I'm going to assume that this is not a bug in Java and that there is in fact some perfectly rational explanation for why this is. But I can't figure out what exactly. What am I missing?

我将假设这不是Java中的错误,实际上有一些完全合理的解释为什么会这样。但我无法弄清楚到底是什么。我错过了什么?

5 个解决方案

#1


12  

I think what you're really trying to ask is:

我想你真正要问的是:

"Why does a HashSet add objects with inequal hash codes even if they claim to be equal?"

“为什么HashSet添加具有不等哈希码的对象,即使它们声称是相等的?”

The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question.

我的问题和你发布的问题之间的区别在于你假设这种行为是一个错误,因此从这个角度来看你会感到悲伤。我认为其他海报已经完全解释了为什么这不是一个错误,但是他们没有解决潜在的问题。

I will try to do so here; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing.

我会在这里尝试这样做;我建议改写你的问题,以消除Java中糟糕的文档/错误的指控,这样你就可以更直接地探究为什么你遇到了你所看到的行为。


The equals() documentations states (emphasis added):

equals()文档说明(强调添加):

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

请注意,通常需要在重写此方法时覆盖hashCode方法,以便维护hashCode方法的常规协定,该方法声明相等的对象必须具有相等的哈希代码。

The contract between equals() and hashCode() isn't just an annoying quirk in the Java specification. It provides some very valuable benefits in terms of algorithm optimization. By being able to assume that a.equals(b) implies a.hashCode() == b.hashCode() we can do some basic equivalence tests without needing to call equals() directly. In particular, the invariant above can be turned around - a.hashCode() != b.hashCode() implies a.equals(b) will be false.

equals()和hashCode()之间的契约不仅仅是Java规范中令人讨厌的怪癖。它在算法优化方面提供了一些非常有价值的好处。通过假设a.equals(b)暗示a.hashCode()== b.hashCode(),我们可以做一些基本的等价测试而无需直接调用equals()。特别是,上面的不变量可以转换 - a.hashCode()!= b.hashCode()意味着a.equals(b)将为false。

If you look at the code for HashMap (which HashSet uses internally), you'll notice an inner static class Entry, defined like so:

如果你看一下HashMap的代码(HashSet在内部使用),你会注意到一个内部静态类Entry,定义如下:

static class Entry implements Map.Entry {
  final K key;
  V value;
  Entry next;
  int hash;
  ...
}

HashMap stores the key's hash code along with the key and value. Because a hash code is expected to not change over the time a key is stored in the map (see Map's documentation, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.") it is safe for HashMap to cache this value. By doing so, it only needs to call hashCode() once for each key in the map, as opposed to every time the key is inspected.

HashMap存储密钥的哈希码以及密钥和值。由于哈希码在密钥存储在地图中的时间内不会发生变化(请参阅Map的文档,“如果对象的值以影响等于比较的方式更改,则不会指定地图的行为。 object是map中的一个键。“)HashMap可以安全地缓存这个值。通过这样做,它只需要为地图中的每个键调用一次hashCode(),而不是每次检查键时。

Now lets look at the implementation of put(), where we see these cached hashes being taken advantage of, along with the invariant above:

现在让我们看一下put()的实现,我们看到这些缓存的哈希被利用,以及上面的不变量:

public V put(K key, V value) {
  ...
  int hash = hash(key);
  int i = indexFor(hash, table.length);
  for (Entry e = table[i]; e != null; e = e.next) {
    Object k;
    if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
      // Replace existing element and return
    }
  }
  // Insert new element
}

In particular, notice that the conditional only ever calls key.equals(k) if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation. By the contract of these methods, it should be safe for HashMap to skip this call. If your objects are incorrectly implemented, these assumptions being made by HashMap are no longer true, and you will get back unusable results, including "duplicates" in your set.

特别注意,由于短路评估,如果哈希码相等且密钥不是完全相同的对象,则条件只调用key.equals(k)。通过这些方法的契约,HashMap跳过此调用应该是安全的。如果你的对象被错误地实现,那么HashMap做出的这些假设就不再正确,你将得到不可用的结果,包括你的集合中的“重复”。


Note that your claim "HashSet ... has an add(Object o) method, which is not inherited from another class" is not quite correct. While its parent class, AbstractSet, does not implement this method, the parent interface, Set, does specify the method's contract. The Set interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2)). As long as you follow the contracts, HashSet works as documented, but avoids actually doing wasteful work whenever possible. As soon as you break the rules however, HashSet cannot be expected to behave in any useful way.

请注意,您的声明“HashSet ...有一个add(Object o)方法,它不是从另一个类继承的”,这是不正确的。虽然它的父类AbstractSet没有实现此方法,但父接口Set确实指定了方法的契约。 Set接口不关心哈希,只关注相等,因此它根据等式(e == null?e2 == null:e.equals(e2))指定此方法的行为。只要您遵循合同,HashSet就会按照文档记录,但尽可能避免实际浪费的工作。但是,一旦违反规则,就不能指望HashSet以任何有用的方式运行。

Consider also that if you attempted to store objects in a TreeSet with an incorrectly implemented Comparator, you would similarly see nonsensical results. I documented some examples of how a TreeSet behaves when using an untrustworthy Comparator in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet?

还要考虑如果您尝试使用错误实现的Comparator将对象存储在TreeSet中,您同样会看到无意义的结果。我记录了一些TreeSet在另一个问题中使用不可靠的Comparator时的行为示例:如何在Java中为StringBuffer类实现比较器以便在TreeSet中使用?

#2


8  

You've violated the contract of equals/hashCode basically:

你基本违反了equals / hashCode的契约:

From the hashCode() docs:

来自hashCode()文档:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

如果两个对象根据equals(Object)方法相等,则对两个对象中的每一个调用hashCode方法必须生成相同的整数结果。

and from equals:

从平等:

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

请注意,通常需要在重写此方法时覆盖hashCode方法,以便维护hashCode方法的常规协定,该方法声明相等的对象必须具有相等的哈希代码。

HashSet relies on equals and hashCode being implemented consistently - the Hash part of the name HashSet basically implies "This class uses hashCode for efficiency purposes." If the two methods are not implemented consistently, all bets are off.

HashSet依赖于equals和hashCode一致地实现 - 名称HashSet的Hash部分基本上暗示“此类使用hashCode来提高效率”。如果两种方法没有一致地实施,那么所有的赌注都是关闭的。

This shouldn't happen in real code, because you shouldn't be violating the contract in real code...

这不应该在实际代码中发生,因为你不应该违反实际代码中的合同......

#3


2  

@Override
public int hashCode(){
    return new Random().nextInt();
}

You are returning different has codes for same object every time it is evaluated. Obviously you will get wrong results.

每次评估时,您都会返回相同对象的不同代码。显然你会得到错误的结果。


add() function is as follows

add()函数如下

public boolean add(E e) {
return map.put(e, PRESENT)==null;
}

and put() is

和put()是

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for (Entry e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

If you notice first has is calculated which is different in your case which is why object is added. equals() comes into picture only if hash are same for objects i.e collision has occured. Since in case hash are different equals() is never executed

如果您注意到第一个已计算出的情况与您的情况不同,这就是添加对象的原因。只有当对象的哈希值相同,即碰撞发生时,equals()才会出现。由于哈希值不同,因此从不执行equals()

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

Read more on what short circuiting is. since e.hash == hash is false nothing else is evaluated.

阅读有关短路的更多信息。由于e.hash == hash为false,因此不会评估任何其他内容。

I hope this helps.

我希望这有帮助。

#4


0  

because hashcode() is really implemented very badly,

因为hashcode()实际上非常糟糕,

it will try to equate in each random bucket on each add(), if you return constant value from hashcode() it wouldn't let you enter any

它将尝试在每个add()上的每个随机桶中等同,如果从hashcode()返回常量值,它将不允许您输入任何

#5


0  

It is not required that hash codes be different for all elements! It is only required that two elements are not equal.

并不要求所有元素的哈希码都不同!只需要两个元素不相等。

HashCode is used first to find the hash bucket the object should occupy. If hadhcodes are different, objects are assumed to be not equal. If hashcodes are equal, then the equals() method is used to determine equality. The use of hashCode is an efficiency mechanism.

首先使用HashCode来查找对象应占用的哈希桶。如果hadhcodes不同,则假定对象不相等。如果哈希码相等,则使用equals()方法确定相等性。 hashCode的使用是一种效率机制。

And...
Your hash code implementation violates the contract that it should not change unless the objects identifying fields change.

并且...您的哈希代码实现违反了它不应该更改的合同,除非标识字段的对象发生更改。


推荐阅读
  • 在多线程并发环境中,普通变量的操作往往是线程不安全的。本文通过一个简单的例子,展示了如何使用 AtomicInteger 类及其核心的 CAS 无锁算法来保证线程安全。 ... [详细]
  • 本文详细介绍了Java反射机制的基本概念、获取Class对象的方法、反射的主要功能及其在实际开发中的应用。通过具体示例,帮助读者更好地理解和使用Java反射。 ... [详细]
  • 本文深入探讨了CGLIB BeanCopier在Bean对象复制中的应用及其优化技巧。相较于Spring的BeanUtils和Apache的BeanUtils,CGLIB BeanCopier在性能上具有显著优势。通过详细分析其内部机制和使用场景,本文提供了多种优化方法,帮助开发者在实际项目中更高效地利用这一工具。此外,文章还讨论了CGLIB BeanCopier在复杂对象结构和大规模数据处理中的表现,为读者提供了实用的参考和建议。 ... [详细]
  • Java设计模式详解:解释器模式的应用与实现
    本文详细介绍了Java设计模式中的解释器模式,包括其定义、应用场景、优缺点以及具体的实现示例。通过音乐解释器的例子,帮助读者更好地理解和应用这一模式。 ... [详细]
  • C语言编写线程池的简单实现方法
    2019独角兽企业重金招聘Python工程师标准好文章,一起分享——有时我们会需要大量线程来处理一些相互独立的任务,为了避免频繁的申请释放线程所带 ... [详细]
  • 题目描述:给定一个区间,支持两种操作:1. 将位置a的值修改为b;2. 查询区间[a, b]内的子序列的最大和,其中子序列中相邻的元素必须具有不同的奇偶性。 ... [详细]
  • 普通树(每个节点可以有任意数量的子节点)级序遍历 ... [详细]
  • 本文介绍了 Go 语言中的高性能、可扩展、轻量级 Web 框架 Echo。Echo 框架简单易用,仅需几行代码即可启动一个高性能 HTTP 服务。 ... [详细]
  • 本文介绍了如何将包含复杂对象的字典保存到文件,并从文件中读取这些字典。 ... [详细]
  • MySQL初级篇——字符串、日期时间、流程控制函数的相关应用
    文章目录:1.字符串函数2.日期时间函数2.1获取日期时间2.2日期与时间戳的转换2.3获取年月日、时分秒、星期数、天数等函数2.4时间和秒钟的转换2. ... [详细]
  • 本文节选自《NLTK基础教程——用NLTK和Python库构建机器学习应用》一书的第1章第1.2节,作者Nitin Hardeniya。本文将带领读者快速了解Python的基础知识,为后续的机器学习应用打下坚实的基础。 ... [详细]
  • JUC(三):深入解析AQS
    本文详细介绍了Java并发工具包中的核心类AQS(AbstractQueuedSynchronizer),包括其基本概念、数据结构、源码分析及核心方法的实现。 ... [详细]
  • 在C#中开发MP3播放器时,我正在考虑如何高效存储元数据以便快速检索。选择合适的数据结构,如字典或数组,对于优化性能至关重要。字典能够提供快速的键值对查找,而数组则在连续存储和遍历方面表现优异。根据具体需求,合理选择数据结构将显著提升应用的响应速度和用户体验。 ... [详细]
  • 如何高效启动大数据应用之旅?
    在前一篇文章中,我探讨了大数据的定义及其与数据挖掘的区别。本文将重点介绍如何高效启动大数据应用项目,涵盖关键步骤和最佳实践,帮助读者快速踏上大数据之旅。 ... [详细]
  • 探索聚类分析中的K-Means与DBSCAN算法及其应用
    聚类分析是一种用于解决样本或特征分类问题的统计分析方法,也是数据挖掘领域的重要算法之一。本文主要探讨了K-Means和DBSCAN两种聚类算法的原理及其应用场景。K-Means算法通过迭代优化簇中心来实现数据点的划分,适用于球形分布的数据集;而DBSCAN算法则基于密度进行聚类,能够有效识别任意形状的簇,并且对噪声数据具有较好的鲁棒性。通过对这两种算法的对比分析,本文旨在为实际应用中选择合适的聚类方法提供参考。 ... [详细]
author-avatar
LordHo
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有