Eq () and hash () in Python: Difference between revisions
(12 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=External= | |||
* https://eng.lyft.com/hashing-and-equality-in-python-2ea8c738fb9d | |||
=Internal= | =Internal= | ||
* [[Python_Language_OOP# | * [[Python_Language_OOP#Identity_of_an_Object_Instance|Python Language OOP | Identity of an Object Instance]] | ||
* [[Hash_Functions#Overview|Hash Functions]] | * [[Hash_Functions#Overview|Hash Functions]] | ||
* [[Python Language Dictionary#Overview|Dictionaries]] | * [[Python Language Dictionary#Overview|Dictionaries]] | ||
Line 8: | Line 10: | ||
=Overview= | =Overview= | ||
Also see: {{Internal|Python_Language_OOP#Identity_of_an_Object_Instance|Python Language OOP | Identity of an Object Instance}} | |||
==<tt>__hash__()</tt> | =<span id='eq'></span><tt>__eq__()</tt>= | ||
The <code>__eq__()</code> function indicates logical equality between object instances. | |||
If not explicitly implemented by the class, the default implementation uses [[Python_Language_OOP#Identity_of_an_Object_Instance|the identity]] of the object. | |||
A typical <code>__eq__</code> implementation based on attributes of an object is: | |||
<syntaxhighlight lang='py'> | |||
class SomeObject: | |||
def __init__(self, a, b): | |||
self._a = a | |||
self._b = b | |||
def __str__(self): | |||
return f'a={self._a}, b={self._b}' | |||
def __eq__(self, other): | |||
if not isinstance(other, SomeObject): | |||
return False | |||
return self._a == other._a and self._b == other._b | |||
# | |||
# if __eq__() is overridden, __hash()__ must be also overridden so two "equal" objects | |||
# have the same hash. Also, the hash value must not change for the duration of the | |||
# object life | |||
# | |||
def __hash__(self): | |||
return 17 * hash(self._a) + hash(self._b) | |||
o = SomeObject(1, 2) | |||
o2 = SomeObject(1, 2) | |||
print(o) | |||
print(o2) | |||
hash_o = hash(o) | |||
hash_o2 = hash(o2) | |||
print(hash_o) | |||
print(hash_o2) | |||
assert o == o2 | |||
assert hash_o == hash_o2 | |||
</syntaxhighlight> | |||
=<tt>__hash__()</tt>= | |||
The <code>__hash__()</code> function computes the hash of the key. The [[Python_Language_Functions#hash|<code>hash()</code>]] built-in ends up calling <code>__hash__()</code>. | The <code>__hash__()</code> function computes the hash of the key. The [[Python_Language_Functions#hash|<code>hash()</code>]] built-in ends up calling <code>__hash__()</code>. | ||
The default <code>__hash__()</code> implementation, if not overridden, is based on [[Python_Language_OOP#Identity_of_an_Object_Instance|the identity]] of the object instance. | |||
Because the <code>hash()</code> function is used to store objects in dictionary and keys, as explained [[#bee56|below]], if two instances are equal as indicated by the [[#eq|<code>__eq()__</code>]], they '''must''' have equal hashes. | Because the <code>hash()</code> function is used to store objects in dictionary and keys, as explained [[#bee56|below]], if two instances are equal as indicated by the [[#eq|<code>__eq()__</code>]], they '''must''' have equal hashes. | ||
Also, because once stored in a dictionary, the hash of the object is stored alongside it and it is used when the dictionary is resized, instead of being re-calculated, the hash of an object must never change for the life of the object. | Also, because once stored in a dictionary, the hash of the object is stored alongside it and it is used when the dictionary is resized, [[#hw3s |instead of being re-calculated]], '''the hash of an object must never change for the life of the object'''. | ||
==<span id='bee56'></span>Dictionaries and the <tt>__hash__()</tt> Function== | |||
To store a key into a [[Python_Language_Dictionary#Dictionaries_and_the_hash_.28.29_Function|dictionary]], Python performs the following sequence: | To store a key into a [[Python_Language_Dictionary#Dictionaries_and_the_hash_.28.29_Function|dictionary]], Python performs the following sequence: | ||
Line 35: | Line 86: | ||
3. Iterate the bucket for an entry matching the hash value. <font color=darkkhaki>Why? Don't all keys stored in the bucket have the same hash value already?</font>. If an entry exists, <font color=darkkhaki>check for equality, first by identity then by calling <code>__eq__()</code>.</font> | 3. Iterate the bucket for an entry matching the hash value. <font color=darkkhaki>Why? Don't all keys stored in the bucket have the same hash value already?</font>. If an entry exists, <font color=darkkhaki>check for equality, first by identity then by calling <code>__eq__()</code>.</font> | ||
==Hashability== | |||
You can check whether an object is hashable - can be used as a [[Python_Language_Dictionary#Key_Discussion|key in a dictionary]] - with the <code>hash()</code> function: | |||
<syntaxhighlight lang='py'> | |||
hash ([2, 3]) | |||
</syntaxhighlight> | |||
<font size=-1> | |||
TypeError: unhashable type: 'list' | |||
</font> |
Latest revision as of 23:39, 16 May 2024
External
Internal
- Python Language OOP | Identity of an Object Instance
- Hash Functions
- Dictionaries
- Sets
- The Python
hash()
built-in function
Overview
Also see:
__eq__()
The __eq__()
function indicates logical equality between object instances.
If not explicitly implemented by the class, the default implementation uses the identity of the object.
A typical __eq__
implementation based on attributes of an object is:
class SomeObject:
def __init__(self, a, b):
self._a = a
self._b = b
def __str__(self):
return f'a={self._a}, b={self._b}'
def __eq__(self, other):
if not isinstance(other, SomeObject):
return False
return self._a == other._a and self._b == other._b
#
# if __eq__() is overridden, __hash()__ must be also overridden so two "equal" objects
# have the same hash. Also, the hash value must not change for the duration of the
# object life
#
def __hash__(self):
return 17 * hash(self._a) + hash(self._b)
o = SomeObject(1, 2)
o2 = SomeObject(1, 2)
print(o)
print(o2)
hash_o = hash(o)
hash_o2 = hash(o2)
print(hash_o)
print(hash_o2)
assert o == o2
assert hash_o == hash_o2
__hash__()
The __hash__()
function computes the hash of the key. The hash()
built-in ends up calling __hash__()
.
The default __hash__()
implementation, if not overridden, is based on the identity of the object instance.
Because the hash()
function is used to store objects in dictionary and keys, as explained below, if two instances are equal as indicated by the __eq()__
, they must have equal hashes.
Also, because once stored in a dictionary, the hash of the object is stored alongside it and it is used when the dictionary is resized, instead of being re-calculated, the hash of an object must never change for the life of the object.
Dictionaries and the __hash__() Function
To store a key into a dictionary, Python performs the following sequence:
1. Call __hash__()
on the key and compute the hash of the key. If the key is not hashable, raise a TypeError
.
2. Store (hash_value, key, value) in the bucket at the location hash_value % len(buckets)
3. If the bucket array needs resizing, re-use the previously computed hash value to re-insert all previously stored values. This is why is important that the key is immutable: if the key is mutable and it changes while the key/value pair is stored in the dictionary, lookup and resizing will not work.
To retrieve a key from the dictionary:
1. Call __hash__()
on the key and compute the hash of the key. If the key is not hashable, raise a TypeError
.
2. Locate the hash_value % len(buckets) bucket
3. Iterate the bucket for an entry matching the hash value. Why? Don't all keys stored in the bucket have the same hash value already?. If an entry exists, check for equality, first by identity then by calling __eq__()
.
Hashability
You can check whether an object is hashable - can be used as a key in a dictionary - with the hash()
function:
hash ([2, 3])
TypeError: unhashable type: 'list'