haveibeenfiltered

Checking a password against a breach corpus is a solved problem. Have I Been Pwned has two billion hashes and an API to ask. The snag is that asking means telling a server something about the password. HIBP softens this with k-anonymity — you send the first five hex characters of the SHA-1 and it hands back every hash that starts with them, so the full hash stays home — but it’s still a request on the wire, still rate-limited, still about 200 ms, and still nothing without a network.

haveibeenfiltered inverts it: download the whole corpus once, then answer everything locally. Two billion SHA-1 hashes from HIBP, squeezed into a 1.79 GB file, around 14 microseconds a check, zero dependencies — and after that one download, nothing leaves the machine again.

GitHub npm

Using it

An npm package with zero dependencies — Node builtins only, nothing in the tree but the standard library.

npm install haveibeenfiltered
npx haveibeenfiltered download        # pull the filter once, SHA-256 verified
const hbf = require('haveibeenfiltered')
const filter = await hbf.load()

filter.check('password123')   // true  — breached
filter.check('Tr0ub4dor&3')   // false — not found
filter.checkHash('5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8')  // true

filter.close()                // hand the ~1.8 GB back when you're done

Or from the shell — one password, a precomputed hash, or a whole file piped in:

npx haveibeenfiltered check hunter2
npx haveibeenfiltered check --hash 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8
cat passwords.txt | npx haveibeenfiltered check --stdin

How it works

It’s a ribbon filter — a static set-membership structure, roughly 20% more space-efficient than a Bloom filter. Instead of setting bits as it goes, it stores compact fingerprints as the solution to a system of linear equations over GF(2), solved once by Gaussian elimination at build time. A query just evaluates the system for one key and compares the result to the expected fingerprint. There are no passwords or hashes in the file to read back out — only enough bits to answer yes or no. That’s how 2,048,908,128 SHA-1 hashes fold down to 1.79 GB, about seven bits per password.

A single check touches exactly one of 256 shards, picked by the first byte of the password’s SHA-1. Inside the shard, MurmurHash3 (x64, 128-bit, seeded from the file header) turns the hex hash into three numbers:

  • start — an index into the shard’s solution array.
  • coefficient — a 64-bit mask (the ribbon is 64 wide) selecting which solution rows to XOR.
  • fingerprint — a 7-bit value to compare the result against.

XOR the selected rows from the start index, compare to the fingerprint, done — one arithmetic pass over a few bytes, no decompression and no scan. The handful of keys that wouldn’t fit the linear system at construction time (“bumped” keys) sit in a small per-shard overflow table and get checked there instead. If the matrix-solving underneath all this sounds like witchcraft, the ribbon filter visualization runs the whole construction in front of you.

password SHA-1 1st byte shard 256 total MurmurHash3 x64 128-bit, seeded start coeff 64-bit fprint 7-bit XOR solution[i] where bit set match ? yes: found (or check overflow) no: not in set
The query flow, one shard deep — SHA-1 to shard, MurmurHash3 to the start/coefficient/fingerprint triple, XOR against the solution, fingerprint match.

False positives, false negatives

A probabilistic filter can be wrong two ways, and only one of them can happen here:

  • False positive — a safe password flagged as breached. Possible, because the filter keeps 7-bit fingerprints rather than full hashes, so roughly 1 in 128 (~0.78%) clean passwords collide with one.
  • False negative — a breached password reported as safe. Cannot happen. If a hash is in the set, the equations resolve to its fingerprint every time.

So a false is a guarantee and a true is right about 99.2% of the time — the right way round for this job, where missing a breached password is expensive and a spurious “pick another one” costs nothing.

Benchmarks

HIBP passwords2,048,908,128
Filter size1.79 GiB
Fingerprint7 bits
False-positive rate1/128 (~0.78%)
False negatives0
check(password)~14 µs
checkHash(sha1hex)~8 µs
Throughput (single core)~72k–121k / sec
npm dependencies0

Versus the HIBP API

Both answer the same question. They make opposite trades.

haveibeenfilteredHIBP API
Privacynothing leaves the machinek-anonymity (5-char SHA-1 prefix sent)
Speed~14 µs / check~200 ms / request
Offlineyesno
Setup1.79 GB downloadnone
RAM~1.8 GBnone
False positives~0.78%0% (exact)
Rate limitsnoneyes
Data freshnessstatic snapshotcontinuously updated

Filter files

The full HIBP set is 1.79 GB, but it isn’t the only cut. Trim by breach frequency when you don’t need the long tail, or drop to a common-passwords list when you just want a sanity check in a signup form. The binaries live on Cloudflare R2 at files.haveibeenfiltered.com/v0.1/, are SHA-256 verified after download, and the CLI pulls them for you.

FileSizeWhat
ribbon-hibp-v1.bin1.79 GiBthe whole HIBP set
ribbon-hibp-v1-min5.bin726 MBhashes seen 5+ times
ribbon-hibp-v1-min10.bin435 MBseen 10+ times
ribbon-hibp-v1-min20.bin259 MBseen 20+ times
ribbon-top10m-v1.bin9.0 MBmost common ten million
ribbon-rockyou-v1.bin12.8 MBthe 2009 RockYou breach
ribbon-top1m-v1.bin0.9 MBmost common million

A few questions, answered

Is it production-ready?

Yes. Zero npm dependencies — just Node builtins (crypto, fs, path, https). The filter loads into memory once and every check after that is in-memory array math, with no network I/O.

Does it phone home?

No. The only network request the package ever makes is downloading the filter file, and only when you ask — npx haveibeenfiltered download, or autoDownload: true if you opt in. HTTPS only, redirects refused. The checks themselves never touch the network.

How much RAM?

The whole filter lives in memory: ~1.8 GB for the full HIBP set, ~13 MB for RockYou, ~1 MB for top1m. filter.close() hands it back.

How fresh is the data?

The files are static snapshots; this one is built from the 2,048,908,128-password HIBP list. New versions ship when the source does.

Built on the Have I Been Pwned password corpus by Troy Hunt. Source on GitHub, MIT-licensed.