@hackage bloomfilter-blocked0.1.0.1

Fast, compact Bloom filters

This library provides Bloom filters. A Bloom filter is a compact but probabilistic set-like data structure, supporting fast insert and membership operations.

Bloom filters are probabilistic in the sense that when querying for set membership, they can (with some probability) falsely report that an element is present when it is not present. On the other hand it will never falsely report that an element is not present when in fact it is present. That is, it can have false positives but not false negatives. The false positive rate (FPR) can be adjusted.

Bloom filters are compact, needing only a few bits per element. For example 10 bits per element is enough for a false positive rate (FPR) of 1%, and 15 bits for 0.1%.

The library includes two implementations of Bloom filters:

classic
in Data.BloomFilter.Classic: a default implementation that is faithful to the classical description of a Bloom filter data structure.
block-structured
in Data.BloomFilter.Blocked: a cache optimised representation that is faster, at the cost of needing a few more bits per element to achieve a target FPR.

Features of the library:

  • Fast. See the benchmark results.

  • Compact. It uses optimal sized bit arrays: no bigger than necessary.

  • Faster still: the block-structured Bloom filters are even faster, at the expense of needing more bits per entry.

  • Supports very large Bloom filters, bigger than 2^32 bits.

  • Simple API for specifying the size of Bloom filters.

  • Support for hash salting, for using Bloom filters with untrusted inputs.

  • Serialisation support with format version tracking.