Bloom filter

A bloom filter remembers without knowing. Each word leaves three traces in a field of 4,000 bits. Traces overlap. False memories form. Watch a thousand words dissolve into probability.

That was poetry, but bloom filters are actually real — and surprisingly useful. A bloom filter is a space-efficient probabilistic data structure for membership testing. It answers one question: "have I seen this before?" It can tell you definitely no or probably yes, but never the other way around. No false negatives, only false positives.

It works by hashing each item through several independent hash functions, each pointing to a position in a bit array. To add an item, set those bits to 1. To check membership, look at those same positions — if any bit is 0, the item was never added. If all bits are 1, it probably was, but another combination of items might have set those same bits by coincidence — a hash collision across multiple functions.

The trade-off is space: a bloom filter that stores a million items with a 1% false positive rate needs only 1.2 MB — regardless of how large the items themselves are. No actual data is stored, just the shadow of its presence. You can't list what's inside, you can't delete entries, and you can't extract the original values. It's a one-way, write-once memory that trades certainty for extreme compactness.

The interactive visualization below is a real 4,000-bit bloom filter (80×50 grid) with 3 MurmurHash3 functions and 1,000 English words. Blue cells are set once, amber and darker tones indicate overlapping bits — where false positives are born. Hover any cell to see which words set it. Search any word to test membership.

0/1,000 words · 0/4,000 bits · 0.0% fill · 0.00% fp · 0 bit ↔ · 0 word ↔ hover a cell

Bloom filters are everywhere in production systems — Cassandra and RocksDB use them to skip disk reads, Chrome used one to screen URLs against malware lists, and Ethereum encodes them into block headers for fast log searching. Anywhere you need a fast "definitely not here" answer without storing actual data, a bloom filter fits.

I used a similar approach on 88x31.lol to count unique visitors without storing any IP addresses. Each visitor's IP is hashed through the filter — if the bits are already set, they've been counted before. If not, it's a new unique. No logs, no identifiable data retained, just a bit array that grows fuzzier over time.