The Deduplication Problem
Jump to navigation
Jump to search
Internal
Overview
The deduplication problem applies to streams of data for which we need to eliminate duplicates. By stream of data we mean either a large static set, or data that becomes available over time.
The goal is to ignore duplicates and only remember the distinct objects in the stream.
The solution is to use a hash table.