Hash Functions: Difference between revisions
Line 5: | Line 5: | ||
* [[Hash_Table#Hash_Functions|Hash Tables]] | * [[Hash_Table#Hash_Functions|Hash Tables]] | ||
=Overview= | =Overview= | ||
The hash function should be easy to compute, in constant time. It should "spread out" most of the data sets, at least the non-pathological data sets. Also, the has function should be easy to store. | |||
In case the hash table is implemented using [[Hash_Table#Chaining|chaining]], the running times of all operations will be given by the length of the chaining lists, and the average length of those lists depends on the quality of the hash function: better the hash function spreads the keys over buckets, more balanced and short the chaining lists are. A similar situation is encountered in case the hash table is implemented with [[Hash_Table#Open_Addressing|open addressing]]. The running time is governed by the length of the probe sequence: how many time you have to look in the hash table until a suitable empty bucket is found. The probe sequence is going to depend on the hash function. For good hash functions, that spread data evenly, the probe sequences will be relatively short. For a constant hash function, each new addition is a collision with increasingly long probe sequence. | In case the hash table is implemented using [[Hash_Table#Chaining|chaining]], the running times of all operations will be given by the length of the chaining lists, and the average length of those lists depends on the quality of the hash function: better the hash function spreads the keys over buckets, more balanced and short the chaining lists are. A similar situation is encountered in case the hash table is implemented with [[Hash_Table#Open_Addressing|open addressing]]. The running time is governed by the length of the probe sequence: how many time you have to look in the hash table until a suitable empty bucket is found. The probe sequence is going to depend on the hash function. For good hash functions, that spread data evenly, the probe sequences will be relatively short. For a constant hash function, each new addition is a collision with increasingly long probe sequence. | ||
Revision as of 20:28, 17 October 2021
External
- https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/Ob0K7/hash-tables-implementation-details-part-i
- https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/1Y8t1/hash-tables-implementation-details-part-ii
Internal
Overview
The hash function should be easy to compute, in constant time. It should "spread out" most of the data sets, at least the non-pathological data sets. Also, the has function should be easy to store.
In case the hash table is implemented using chaining, the running times of all operations will be given by the length of the chaining lists, and the average length of those lists depends on the quality of the hash function: better the hash function spreads the keys over buckets, more balanced and short the chaining lists are. A similar situation is encountered in case the hash table is implemented with open addressing. The running time is governed by the length of the probe sequence: how many time you have to look in the hash table until a suitable empty bucket is found. The probe sequence is going to depend on the hash function. For good hash functions, that spread data evenly, the probe sequences will be relatively short. For a constant hash function, each new addition is a collision with increasingly long probe sequence.