python如何计算元组的哈希值-Howdoespythoncomputethehashofatuple

作者：记者刘方斌_304 | 来源：互联网 | 2023-07-10 12:48

Inpython,ifIhaveatuplewithmanyelements,isitshashcalculatedfromitselementsidsorit

In python, if I have a tuple with many elements, is its hash calculated from its elements' ids or its elements' content?

在python中,如果我有一个包含许多元素的元组,那么它的哈希值是根据元素的ids还是元素的内容来计算的?

In this example,

在这个例子中,

a = (1, [1,2])
hash(a)

It errors out saying list is unhashable. So I guess it's not computed by id, or probably there is a check on whether the element is mutable.

它错误地说列表是不可用的。所以我猜它不是由id计算的,或者可能是检查元素是否可变。

Now see this example

现在看这个例子

class A: pass
a0 = A()
ta = (1, a0)
hash(ta)  # -1122968024
a0.x = 20
hash(ta)  # -1122968024

Here it turns out the hash of ta does not change with the modification of its element, i.e., a0. So maybe a0's id is used for the hash calculation? Is a0 somehow considered as immutable? How does python know if a type is mutable?

在这里,事实证明ta的散列不随其元素的修改而改变,即a0。那么也许a0的id用于哈希计算? a0在某种程度上被认为是不可变的吗? python如何知道类型是否可变?

Now consider this case

现在考虑这个案例

b = (1, 2)
id(b)  # 3980742764
c = (1, 2)
id(c)  # 3980732588
tb = (1, b)
tc = (1, c) 
hash(tb)  # -1383040070
hash(tc)  # -1383040070

It seems the content of b and c are used for the hash calculation.

似乎b和c的内容用于哈希计算。

How should I understand these examples?

我该如何理解这些例子?

4 个解决方案

#1

Neither. It is calculated on the basis of the hashes of these elements, not the contents (values).

都不是。它是根据这些元素的散列计算的,而不是内容(值)。

Take a look at this paragraph in python's documentation glossary.

在python的文档词汇表中查看此段落。

Whether something is hashable or not, and how it is hashed, depends on the implementation of its .__hash__() method. Python itself has no idea about mutability of an object.

是否某些东西是可以清除的,以及它是如何散列的,取决于它的.__ hash __()方法的实现。 Python本身并不知道对象的可变性。

In your first example, tuple happens to hash itself on the basis of its elements, while a list doesn't have a hash at all - the .__hash__() method is not implemented for it (and for a good reason). That's why a tuple with a list object inside of it is not hashable.

在你的第一个例子中,元组恰好在其元素的基础上散列自身,而列表根本没有散列 - .__散列__()方法没有为它实现(并且有充分的理由)。这就是为什么在其中包含列表对象的元组不可清除的原因。

Now, having that in mind, let's have a look at python data model documentation, and what it has to say on the topic:

现在,考虑到这一点,让我们看看python数据模型文档,以及它对该主题的看法:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

用户定义的类默认具有__eq __()和__hash __()方法;与它们相比,所有对象都比较不相等(除了它们自己)和x .__ hash __()返回一个合适的值,使得x == y意味着x是y而hash(x)== hash(y)。

That's why you don't have to define .__hash__() for your classes - python does it for you in this case. The default implementation doesn't take instance fields into the account though. That's why you can change the values inside your object without changing its hash.

这就是为什么你不必为你的类定义.__ hash __() - 在这种情况下python为你做了。默认实现不会将实例字段带入帐户。这就是为什么您可以更改对象内部的值而不更改其哈希值的原因。

In this regard you're right - the default (CPython's) implementation of the hashing function for custom classes relies on the id() of an object, and not on the values inside of it. It is an implementation detail, and it differs between Python versions though. In more recent versions of Python the relation between hash() and id() involves some randomization.

在这方面你是对的 - 自定义类的哈希函数的默认(CPython)实现依赖于对象的id(),而不依赖于它内部的值。它是一个实现细节,但它在Python版本之间有所不同。在更新的Python版本中,hash()和id()之间的关系涉及一些随机化。

But how does it actually hash itself?

While the details are quite complicated and probably involve some advanced math, the implementation of the hash function for tuple objects is written in C, and can be seen here (see static Py_hash_t tuplehash(PyTupleObject *v).

虽然细节非常复杂并且可能涉及一些高级数学,但是元组对象的散列函数的实现是用C语言编写的,可以在这里看到(参见静态Py_hash_t tuplehash(PyTupleObject * v)。

The calculation involves XORing a constant with the hashes of each of the tuple's elements. The line responsible for hashing of the elements is this one:

计算涉及使用每个元组元素的哈希对一个常量进行异或运算。负责元素散列的行是这样的:

y = PyObject_Hash(*p++);

So, to answer your original question: it does a bunch of XOR hokus-pocus with the hashes of each of its elements. Whether or not the contents of these elements are used depends on their specific hash functions.

所以,回答你原来的问题:它有一堆XOR hokus-pocus和它的每个元素的哈希。是否使用这些元素的内容取决于它们的特定散列函数。

#2

The core contract of hashing is that equal objects have equal hashes. In particular, hashing does not directly care about mutability or mutation; it only cares about mutation that affects equality comparisons.

哈希的核心契约是等对象具有相等的哈希值。特别是,散列并不直接关注可变性或突变;它只关心影响平等比较的突变。

Your first tuple is unhashable because mutating the nested list would change how the tuple behaves in equality comparisons.

你的第一个元组是不可取的,因为改变嵌套列表会改变元组在相等比较中的行为方式。

Mutating a0 in your second example doesn't affect the hash of the tuple because it doesn't affect equality comparisons. a0 is still only equal to itself, and its hash is unchanged.

在第二个示例中对a0进行变换不会影响元组的散列,因为它不会影响相等比较。 a0仍然只等于它自己,并且它的哈希值不变。

tb and tc in your third example have equal hashes because they are equal tuples, regardless of whether their elements are the same objects.

第三个示例中的tb和tc具有相等的哈希值,因为它们是相等的元组,无论它们的元素是否是相同的对象。

This all means that tuples cannot (directly) use id for hashes. If they did, equal tuples with distinct but equal elements could hash differently, violating the contract of hashing. Without special-casing element types, the only things tuples can use to compute their own hashes are their elements' hashes, so tuples base their hashes on their elements' hashes.

这意味着元组不能(直接)使用id作为哈希。如果他们这样做,具有不同但相同元素的相等元组可能会以不同的方式散列,违反哈希合约。如果没有特殊的外壳元素类型,元组可以用来计算它们自己的哈希的唯一东西是它们的元素的哈希,所以元组将它们的哈希基于它们的元素的哈希。

#3

The answer to the question "Is the tuple's hash calculated based on the identity or the value?" is: Neither.

问题的答案“是否根据身份或价值计算元组的哈希?”是:都不是。

The correct answer is that the tuple's hash is calculated from the elements' hashes. How those hashes are calculated is (more or less) irrelevant.

正确的答案是元组的哈希值是从元素的哈希值计算出来的。如何计算这些哈希值(或多或少)无关紧要。

An easy way to prove this is to see what happens when you put a list into a tuple:

证明这一点的一个简单方法是查看将列表放入元组时会发生什么:

>>> hash( (1, 2) )
3713081631934410656
>>> hash( (1, []) )
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unhashable type: 'list'

Because lists aren't hashable, a tuple containing a list isn't hashable either.

由于列表不可清除,因此包含列表的元组也不可清除。

Let's take a closer look at this example you brought:

让我们仔细看看你带来的这个例子:

class A: pass
a0 = A()
ta = (1, a0)
hash(ta)  # -1122968024
a0.x = 20
hash(ta)  # -1122968024

Why doesn't setting a0.x = 20 affect the tuple's hash? Well, if we modify this code to output the hash of a0, you'll see that setting a0.x = 20 has no effect on a0's hash value:

为什么不设置a0.x = 20会影响元组的哈希?好吧,如果我们修改此代码以输出a0的散列,您将看到设置a0.x = 20对a0的散列值没有影响:

a0 = A()
print(hash(a0))  # -9223363274645980307
a0.x = 20
print(hash(a0))  # -9223363274645980307

The reason for this is that python implements a default hash function for you. From the docs:

原因是python为你实现了一个默认的哈希函数。来自文档:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

用户定义的类默认具有__eq __()和__hash __()方法;与它们相比,所有对象都比较不相等(除了它们自己)和x .__ hash __()返回一个合适的值,使得x == y意味着x是y而hash(x)== hash(y)。

The default hash function ignores the object's attributes and calculates the hash based on the object's id. No matter what changes you make to a0, its hash will always stay the same. (Though it is possible to define a custom hash function for instances of your A class by implementing a custom __hash__ method.)

默认的哈希函数忽略对象的属性,并根据对象的id计算哈希值。无论你对a0做了什么改变,它的散列总是保持不变。 (尽管可以通过实现自定义__hash__方法为A类的实例定义自定义哈希函数。)

Addendum: The reason why lists aren't hashable is because they're mutable. From the docs:

附录:列表不可清除的原因是因为它们是可变的。来自文档:

If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

如果一个类定义了可变对象并实现了一个__eq __()方法,那么它不应该实现__hash __(),因为hashable集合的实现要求一个键的哈希值是不可变的(如果对象的哈希值发生变化,那么它将是错误的哈希桶)。

Lists fall into this category.

列表属于此类别。

#4

the hash of a tuple is based on the contents, not on the _id_s of the tuples. And the hashes are computed recursively: if one element isn't hashable (like a list element), then the tuple itself isn't hashable.

元组的哈希基于内容,而不是基于元组的_id_s。并且以递归方式计算哈希:如果一个元素不可哈希(如列表元素),则元组本身不可哈希。

That's perfectly normal that if a and b are tuples and a == b, then hash(a) == hash(b) (if hashes can be computed of course), even if a is not b.

如果a和b是元组并且a == b,则哈希(a)==哈希(b)(如果哈希可以当然计算),即使a不是b,这是完全正常的。

(on the contrary hash(a) == hash(b) doesn't mean that a == b)

(相反,hash(a)== hash(b)并不代表a == b)

The information conveyed by is is often not very useful, because of python object interning for example.

由于例如python对象实习,由is传达的信息通常不是很有用。

推荐阅读

config
使用Shell脚本高效部署MHA高可用集群

本文介绍了如何利用Shell脚本高效地部署MHA（MySQL High Availability）高可用集群。通过详细的脚本编写和配置示例，展示了自动化部署过程中的关键步骤和注意事项。该方法不仅简化了集群的部署流程，还提高了系统的稳定性和可用性。 ... [详细]

蜡笔小新 2024-11-10 10:15:46
main
在范围[0..n-1]中产生m个不同的随机数 - Generating m distinct random numbers in the range [0..n-1]

Ihavetwomethodsofgeneratingmdistinctrandomnumbersintherange[0..n-1]我有两种方法在范围[0.n-1]中生 ... [详细]

蜡笔小新 2024-11-13 09:49:14
get
MATLAB字典学习工具箱SPAMS：稀疏与字典学习的详细介绍、配置及应用实例

SPAMS（Sparse Modeling Software）是一个强大的开源优化工具箱，专为解决多种稀疏估计问题而设计。该工具箱基于MATLAB，提供了丰富的算法和函数，适用于字典学习、信号处理和机器学习等领域。本文将详细介绍SPAMS的配置方法、核心功能及其在实际应用中的典型案例，帮助用户更好地理解和使用这一工具箱。 ... [详细]

蜡笔小新 2024-11-09 16:17:27
string
Java 15 发布，带来多项重要更新！

2020年9月15日，Oracle正式发布了最新的JDK 15版本。本次更新带来了许多新特性，包括隐藏类、EdDSA签名算法、模式匹配、记录类、封闭类和文本块等。 ... [详细]

蜡笔小新 2024-11-14 12:11:09
main
Java 并发编程：深入解析 AtomicInteger 和 CAS 无锁算法

在多线程并发环境中，普通变量的操作往往是线程不安全的。本文通过一个简单的例子，展示了如何使用 AtomicInteger 类及其核心的 CAS 无锁算法来保证线程安全。 ... [详细]

蜡笔小新 2024-11-12 16:40:04
char
检查在所有可能的“？”替换中，给定的二进制字符串中是否出现子字符串“10”带 1 或 0

检查在所有可能的“？”替换中，给定的二进制字符串中是否出现子字符串“10”带 1 或 0 ... [详细]

蜡笔小新 2024-11-12 11:35:01
char
开机自启动的几种方式

0x01快速自启动目录快速启动目录自启动方式源于Windows中的一个目录，这个目录一般叫启动或者Startup。位于该目录下的PE文件会在开机后进行自启动 ... [详细]

蜡笔小新 2024-11-12 11:16:30
object
Python中判断一个集合是否为另一集合子集的两种高效方法及其应用场景分析

Python中判断一个集合是否为另一集合子集的两种高效方法及其应用场景分析 ... [详细]

蜡笔小新 2024-11-11 19:27:53
main
理解和优化进程与线程状态转换机制

在Cisco IOS XR系统中，存在提供服务的服务器和使用这些服务的客户端。本文深入探讨了进程与线程状态转换机制，分析了其在系统性能优化中的关键作用，并提出了改进措施，以提高系统的响应速度和资源利用率。通过详细研究状态转换的各个环节，本文为开发人员和系统管理员提供了实用的指导，旨在提升整体系统效率和稳定性。 ... [详细]

蜡笔小新 2024-11-09 18:33:35
string
包含phppdoerrorcode的词条

包含phppdoerrorcode的词条 ... [详细]

蜡笔小新 2024-11-14 12:06:14
char
MySQL初级篇——字符串、日期时间、流程控制函数的相关应用

文章目录：1.字符串函数2.日期时间函数2.1获取日期时间2.2日期与时间戳的转换2.3获取年月日、时分秒、星期数、天数等函数2.4时间和秒钟的转换2. ... [详细]

蜡笔小新 2024-11-14 10:57:02
config
CentOS 7 中配置开机自动挂载 NFS 的解决方案

本文详细介绍了在 CentOS 7 系统中配置 fstab 文件以实现开机自动挂载 NFS 共享目录的方法，并解决了常见的配置失败问题。 ... [详细]

蜡笔小新 2024-11-13 12:05:24
tree
com.sun.javadoc.PackageDoc.exceptions()方法的使用及代码示例

com.sun.javadoc.PackageDoc.exceptions()方法的使用及代码示例 ... [详细]

蜡笔小新 2024-11-13 10:47:33
char
单片微机原理P3：80C51外部拓展系统

　　外部拓展其实是个相对来说很好玩的章节，可以真正开始用单片机写程序了，比较重要的是外部存储器拓展，81C55拓展，矩阵键盘，动态显示，DAC和ADC。0.IO接口电路概念与存 ... [详细]

蜡笔小新 2024-11-12 19:51:29
char
javascript分页类支持页码格式

前端时间因为项目需要，要对一个产品下所有的附属图片进行分页显示，没考虑ajax一张张请求，所以干脆一次性全部把图片out，然 ... [详细]

蜡笔小新 2024-11-12 14:58:57

记者刘方斌_304

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章