haveibeenfiltered
Checking a password against a breach corpus is a solved problem. Have I Been Pwned has two billion hashes and an API to ask. The snag is that asking means telling a server something about the password. HIBP softens this with k-anonymity — you send the first five hex characters of the SHA-1 and it hands back every hash that starts with them, so the full hash stays home — but it’s still a request on the wire, still rate-limited, still about 200 ms, and still nothing without a network.
haveibeenfiltered inverts it: download the whole corpus once, then answer everything locally. Two billion SHA-1 hashes from HIBP, squeezed into a 1.79 GB file, around 14 microseconds a check, zero dependencies — and after that one download, nothing leaves the machine again.
Using it
An npm package with zero dependencies — Node builtins only, nothing in the tree but the standard library.
npm install haveibeenfiltered
npx haveibeenfiltered download # pull the filter once, SHA-256 verified
const hbf = require('haveibeenfiltered')
const filter = await hbf.load()
filter.check('password123') // true — breached
filter.check('Tr0ub4dor&3') // false — not found
filter.checkHash('5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8') // true
filter.close() // hand the ~1.8 GB back when you're done
Or from the shell — one password, a precomputed hash, or a whole file piped in:
npx haveibeenfiltered check hunter2
npx haveibeenfiltered check --hash 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8
cat passwords.txt | npx haveibeenfiltered check --stdin
How it works
It’s a ribbon filter — a static set-membership structure, roughly 20% more space-efficient than a Bloom filter. Instead of setting bits as it goes, it stores compact fingerprints as the solution to a system of linear equations over GF(2), solved once by Gaussian elimination at build time. A query just evaluates the system for one key and compares the result to the expected fingerprint. There are no passwords or hashes in the file to read back out — only enough bits to answer yes or no. That’s how 2,048,908,128 SHA-1 hashes fold down to 1.79 GB, about seven bits per password.
A single check touches exactly one of 256 shards, picked by the first byte of the password’s SHA-1. Inside the shard, MurmurHash3 (x64, 128-bit, seeded from the file header) turns the hex hash into three numbers:
- start — an index into the shard’s solution array.
- coefficient — a 64-bit mask (the ribbon is 64 wide) selecting which solution rows to XOR.
- fingerprint — a 7-bit value to compare the result against.
XOR the selected rows from the start index, compare to the fingerprint, done — one arithmetic pass over a few bytes, no decompression and no scan. The handful of keys that wouldn’t fit the linear system at construction time (“bumped” keys) sit in a small per-shard overflow table and get checked there instead. If the matrix-solving underneath all this sounds like witchcraft, the ribbon filter visualization runs the whole construction in front of you.
False positives, false negatives
A probabilistic filter can be wrong two ways, and only one of them can happen here:
- False positive — a safe password flagged as breached. Possible, because the filter keeps 7-bit fingerprints rather than full hashes, so roughly 1 in 128 (~0.78%) clean passwords collide with one.
- False negative — a breached password reported as safe. Cannot happen. If a hash is in the set, the equations resolve to its fingerprint every time.
So a false is a guarantee and a true is right about 99.2% of the time — the right way round for this job, where missing a breached password is expensive and a spurious “pick another one” costs nothing.
Benchmarks
| HIBP passwords | 2,048,908,128 |
| Filter size | 1.79 GiB |
| Fingerprint | 7 bits |
| False-positive rate | 1/128 (~0.78%) |
| False negatives | 0 |
check(password) | ~14 µs |
checkHash(sha1hex) | ~8 µs |
| Throughput (single core) | ~72k–121k / sec |
| npm dependencies | 0 |
Versus the HIBP API
Both answer the same question. They make opposite trades.
| haveibeenfiltered | HIBP API | |
|---|---|---|
| Privacy | nothing leaves the machine | k-anonymity (5-char SHA-1 prefix sent) |
| Speed | ~14 µs / check | ~200 ms / request |
| Offline | yes | no |
| Setup | 1.79 GB download | none |
| RAM | ~1.8 GB | none |
| False positives | ~0.78% | 0% (exact) |
| Rate limits | none | yes |
| Data freshness | static snapshot | continuously updated |
Filter files
The full HIBP set is 1.79 GB, but it isn’t the only cut. Trim by breach frequency when you don’t need the long tail, or drop to a common-passwords list when you just want a sanity check in a signup form. The binaries live on Cloudflare R2 at files.haveibeenfiltered.com/v0.1/, are SHA-256 verified after download, and the CLI pulls them for you.
| File | Size | What |
|---|---|---|
ribbon-hibp-v1.bin | 1.79 GiB | the whole HIBP set |
ribbon-hibp-v1-min5.bin | 726 MB | hashes seen 5+ times |
ribbon-hibp-v1-min10.bin | 435 MB | seen 10+ times |
ribbon-hibp-v1-min20.bin | 259 MB | seen 20+ times |
ribbon-top10m-v1.bin | 9.0 MB | most common ten million |
ribbon-rockyou-v1.bin | 12.8 MB | the 2009 RockYou breach |
ribbon-top1m-v1.bin | 0.9 MB | most common million |
A few questions, answered
Is it production-ready?
Yes. Zero npm dependencies — just Node builtins (crypto, fs, path, https). The filter loads into memory once and every check after that is in-memory array math, with no network I/O.
Does it phone home?
No. The only network request the package ever makes is downloading the filter file, and only when you ask — npx haveibeenfiltered download, or autoDownload: true if you opt in. HTTPS only, redirects refused. The checks themselves never touch the network.
How much RAM?
The whole filter lives in memory: ~1.8 GB for the full HIBP set, ~13 MB for RockYou, ~1 MB for top1m. filter.close() hands it back.
How fresh is the data?
The files are static snapshots; this one is built from the 2,048,908,128-password HIBP list. New versions ship when the source does.
Built on the Have I Been Pwned password corpus by Troy Hunt. Source on GitHub, MIT-licensed.