Bloom Filters

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Overview

A Bloom filter is a data structure that can be used to remember if a specific element has been "seen" already, in an extremely space efficient manner. The idea is that we apply a hash function to the object (key) and project the result on a set of bits. The Bloom filter starts with all bits set to 0 and over time, some of the bits are turned to 1, depending on the object that have been "seen" by the filter.

A fundamental characteristic is that the filter can produce false positives - it can report that an object has been "seen" when it fact the object has not been seen. This is a consequence of the fact that a large number of objects are "projected" onto a limited space, and collisions or partial collisions may occur. The Bloom filter will never produce a false negative: if an object has been actually seen by the filter, the filter will never report that the object has not been seen.

It can be proved using discrete probabilities that the Bloom filters are useful data structures and the advantages of using space efficiently outweigh the fact that the filter produces false positives.