The Deduplication Problem: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
(Created page with "=Internal= * Hash Tables =Overview= The deduplication problem applies to streams of data for which we need to eliminate duplicates. By stream of d...")
 
 
Line 3: Line 3:
=Overview=
=Overview=
The deduplication problem applies to streams of data for which we need to eliminate duplicates. By stream of data we mean either a large static set, or data that becomes available over time.
The deduplication problem applies to streams of data for which we need to eliminate duplicates. By stream of data we mean either a large static set, or data that becomes available over time.
The goal is to ignore duplicates and only remember the distinct objects in the stream.
The solution is to use a [[Hash_Table#Canonical_Use|hash table]].

Latest revision as of 20:25, 16 October 2021

Internal

Overview

The deduplication problem applies to streams of data for which we need to eliminate duplicates. By stream of data we mean either a large static set, or data that becomes available over time.

The goal is to ignore duplicates and only remember the distinct objects in the stream.

The solution is to use a hash table.