The Deduplication Problem

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Overview

The deduplication problem applies to streams of data for which we need to eliminate duplicates. By stream of data we mean either a large static set, or data that becomes available over time.

The goal is to ignore duplicates and only remember the distinct objects in the stream.

The solution is to use a hash table.