Ribbon filter
A ribbon filter is a static set-membership structure built by solving a linear system over GF(2). Each item becomes one row of a sparse banded matrix; the bit array stored on disk is the solution that satisfies every row at once.
The interactive below builds a small one in front of you — 48 items, 64 columns, ribbon width 12, single-bit fingerprints. Press play and watch the algorithm run; the structure underneath is the same one production filters use at millions-of-rows scale.
How it works
Each item hashes to a triple: a start column s, a 12-bit coefficient pattern c with bit 0 forced to 1, and a 1-bit fingerprint b. The equation for item x is A[s..s+12] · cx = bx, where the dot product is taken over GF(2) — XOR. The filter’s job is to find a bit array A that makes every equation true at once.
1. Sort
Rows get sorted by their start column. The nonzero entries now line up as a diagonal band across the matrix — that’s the ribbon, and it’s the entire reason the algorithm is practical.
2. Eliminate
Banded Gaussian elimination, top-to-bottom. For each row, find the leftmost set bit — that column becomes the row’s pivot. For every later row whose band covers that pivot column and has a 1 there, XOR the pivot row in to clear it. The band structure survives every XOR, so each elimination only touches r cells. Total construction cost is O(n · r) instead of O(n3).
3. Back-substitute
After elimination, each pivot row determines one bit of A. Process pivot columns right-to-left: A[pivot column] = b ⊕ (bits at later columns the row’s coefficients select). By the time you reach column zero, every bit is filled and every original equation holds.
Querying is the same dot product: hash the item, XOR the selected bits, compare to the fingerprint. False-positive rate is 2-f where f is the fingerprint width. The visualization uses f = 1 for legibility (50% FPR) so every cell can be one bit on screen. Real filters use 8–16-bit fingerprints.
Where they show up
Dillinger and Walzer published ribbon filters in 2021. RocksDB switched to them shortly after — the build-once-query-many shape fits LSM-tree compactions, and a ribbon filter at the same false-positive rate is measurably smaller than the alternatives.
My haveibeenfiltered uses one: 2 billion SHA-1 password hashes from Have I Been Pwned, packed into a 1.79 GB filter that runs entirely client-side with no per-check network call. Ribbon filters get within a few percent of the entropy bound — n · log2(1/FPR) bits — which is the floor no probabilistic filter can beat.
For the membership-filter cousin that takes the opposite approach — set bits as you go, no system to solve, no construction phase — see the Bloom filter.