The Deduplication Problem: Difference between revisions
Jump to navigation
Jump to search
(Created page with "=Internal= * Hash Tables =Overview= The deduplication problem applies to streams of data for which we need to eliminate duplicates. By stream of d...") |
|||
Line 3: | Line 3: | ||
=Overview= | =Overview= | ||
The deduplication problem applies to streams of data for which we need to eliminate duplicates. By stream of data we mean either a large static set, or data that becomes available over time. | The deduplication problem applies to streams of data for which we need to eliminate duplicates. By stream of data we mean either a large static set, or data that becomes available over time. | ||
The goal is to ignore duplicates and only remember the distinct objects in the stream. | |||
The solution is to use a [[Hash_Table#Canonical_Use|hash table]]. |
Latest revision as of 20:25, 16 October 2021
Internal
Overview
The deduplication problem applies to streams of data for which we need to eliminate duplicates. By stream of data we mean either a large static set, or data that becomes available over time.
The goal is to ignore duplicates and only remember the distinct objects in the stream.
The solution is to use a hash table.