The Deduplication Problem

From NovaOrdis Knowledge Base
Revision as of 20:25, 16 October 2021 by Ovidiu (talk | contribs) (→‎Overview)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Internal

Overview

The deduplication problem applies to streams of data for which we need to eliminate duplicates. By stream of data we mean either a large static set, or data that becomes available over time.

The goal is to ignore duplicates and only remember the distinct objects in the stream.

The solution is to use a hash table.