A benchmark for password wordlists, and a wordlist that scores well on it.
Companion to the DEF CON 34 talk submission RockYou Is Dead. Blog post: https://patota.io/rockyou-is-dead
Two things, both in this repo:
-
A benchmark. A reproducible methodology for scoring any password wordlist against Have I Been Pwned. The score is a single number (log-AUC) that captures how much of HIBP's frequency-ranked passwords the wordlist contains, weighted across orders of magnitude.
-
Rockme. A 559M-entry wordlist generated through a bloom-filter discovery pipeline. Built on commodity hardware (~$2,200 in mini PCs). Scores 0.91 on the benchmark. A curated subset will be released alongside the talk.
Coverage of HIBP's top-15M most-frequent passwords (April 23, 2026 snapshot, ~2.05B hashes):
| Wordlist | Entries | Top-15M coverage | Total HIBP coverage | log-AUC |
|---|---|---|---|---|
| xato-net 10M | 10M | 6.97% | 0.22% | 0.5901 |
| RockYou | 14M | 12.58% | 0.70% | 0.6192 |
| Pwdb top-10M | 10M | 22.78% | 0.49% | 0.7096 |
| Rockme | 559M | 70.33% | 27.30% | 0.9051 |
Three things worth noticing:
- The de facto standard isn't the best public option. Pwdb top-10M scores 0.71 vs RockYou's 0.62. Pentesters using RockYou as their default are using a 2009 dataset when something measurably better has been publicly available for years.
- Reputation doesn't predict performance. xato-net is more famous than Pwdb but scores worse than RockYou. Same 10M scale, different curation, and different result.
- Public wordlists plateau around AUC 0.7. They're all sampled from breach databases at different times and with different curation strategies. Pwdb's broader source data helps and xato's older source data hurts but the underlying approach is the same: collect what humans actually picked, rank it, and ship it. That approach has a ceiling. The long tail of common-but-not-famous passwords requires generating candidates rather than collecting them.
The Rockme number, 0.91, comes from generating candidates against HIBP as an oracle rather than sampling from any one breach corpus.
Full coverage curves are in data/.
A bloom filter holding HIBP's 2.05B SHA-1 hashes (~3.7 GB in RAM, FPR 1e-5) serves as a membership oracle. Candidate passwords are generated through a family of strategies including stem × suffix expansion, with iterative re-feeding of confirmed discoveries. Bloom hits are SHA-1-confirmed against the actual HIBP JSON; false positives drop at confirmation time. The discovered plaintexts are merged into an enriched HIBP JSON, and coverage is measured by counting attached plaintexts at each rank breakpoint.
The pipeline went through three architectures:
- Redis-sharded — six mini PCs running a Redis cluster, workers querying SISMEMBER over the network. Hit network bottlenecks at 1.02M ops/sec aggregate. Ran ~30 days, grew the list from 15M to 439M, then was abandoned.
- Single Dell — one Dell Micro PC with 128GB RAM, bloom filter in RAM, no network. Reached 1.8M ops/sec on a single Ryzen core. Dell's firmware capped fan speed at 50% regardless of CPU temperature, which throttled the chip so adding more workers gained nothing. Took the list from 439M to 559M in about 2 weeks.
- Cluster, current architecture — six Kron Mini K1s (AMD Ryzen 7 7735HS, 32GB RAM), each holding its own copy of the bloom filter in RAM. Three worker processes per node against a node-local 1/6 slice of the candidate file (18 workers total). Orchestrated by a Raspberry Pi 5. ~16M ops/sec aggregate. Total cluster cost: ~$2,200. Zero network I/O during runtime. Ansible playbooks forthcoming.
The lesson from those three iterations: when computation is small (a SHA-1 plus a bloom check) and data is big (2B hashes), the architecture to want is "replicate the data, partition the work" not "centralize the data, partition the queries."
hibp_to_json.cpp— Convert HIBP rawhash:counttext dump into the sorted JSON format the rest of the pipeline expects.inmemory_worker.cpp— Bloom filter membership oracle. Loads HIBP into RAM, then reads candidate passwords from stdin and writes hits to stdout. ~900K-1M ops/sec per process on a Ryzen 7 7735HS.expand_wordlist_suffix_efficient.cpp— Candidate generator. Reads stems and suffixes; writes the Cartesian product (stem+suffix) to stdout. Has an--adaptivemode for Zipfian budget allocation and exposes Prometheus metrics for monitoring.merge_discoveries_efficient.cpp— Takes a list of candidate plaintexts, SHA-1 them, look up against HIBP JSON, and write an enriched JSON with confirmed plaintexts attached.count_hibp_matches.cpp— Walk the enriched JSON, compute coverage at each rank breakpoint, integrate the curve in log-rank space, emit the log-AUC score and a coverage CSV.
The published Rockme dataset was produced using a broader family of generators (international character variants, leet substitutions, two-word combinations) beyond the suffix expander above. Those will be added in subsequent commits.
xato_coverage.csv— xato-net 10M, scored against April 2026 HIBProckyou_coverage.csv— RockYou, same denominatorpwdb_coverage.csv— Pwdb top-10M, same denominatorrockme_coverage.csv— Rockme, same denominator
Each file has the same 23 breakpoints from rank 100 to rank 15M. The log-AUC score for each list is included as a comment line at the bottom of the CSV.
Requires GCC with C++17 support and OpenSSL.
make
Builds all tools into bin/. Tested on Ubuntu 24.04.
dotnet tool install --global haveibeenpwned-downloader
haveibeenpwned-downloader pwnedpasswords
The downloader produces a file sorted by hash. The benchmark needs it sorted by count descending. This sort needs ~50 GB of free disk for temp files; adjust -T to point at a partition with headroom.
mkdir -p sort_tmp
LC_ALL=C sort -t: -k2 -n -r \
--parallel=$(nproc) \
-S 50% \
-T ./sort_tmp \
pwnedpasswords.txt > pwnedpasswords_sorted.txt
rm -rf sort_tmp
For a wordlist mywords.txt you want to score against HIBP:
# 1. One-time HIBP preprocessing (~25 minutes on commodity hardware)
./bin/hibp_to_json /path/to/pwnedpasswords-sha1.txt hibp.json
# 2. Merge your wordlist's confirmed plaintexts into HIBP
./bin/merge_discoveries_efficient mywords.txt hibp.json hibp_with_mywords.json
# 3. Measure coverage and compute log-AUC
./bin/count_hibp_matches hibp_with_mywords.json mywords_coverage.csvThe third step prints the log-AUC score to stderr and writes the full coverage curve to the CSV. That's a comparable benchmark number against any other wordlist scored the same way.
Released:
- The benchmark methodology and scoring tool
- The full coverage CSVs for xato-net, RockYou, Pwdb, and Rockme
- The C++ research tooling
- A curated Rockme release (size to be determined; forthcoming with the talk)
- Base-word and suffix corpora used in the pipeline (forthcoming)
- Ansible playbooks for the 6-node cluster (forthcoming)
Not released: the full 559M Rockme plaintext list. Releasing it primarily benefits attackers. Defenders already have HIBP's k-anonymity API for the same defensive use case. Researchers can rebuild the full list from the released inputs and HIBP itself.
The Redis-era code is also not released — that architecture didn't pan out, and shipping it would just confuse anyone trying to reproduce the working pipeline.
Research code accompanying ongoing DEF CON 34 work. Cleanup and generalization will continue post-talk. Issues and pull requests welcome once the talk is presented.
MIT — see LICENSE.
If you use this work, please cite the talk (DEF CON 34, August 2026) once it's published. Until then:
John Patota, "RockYou Is Dead: A Benchmark for Password Wordlists",
DEF CON 34 submission, April 2026.
https://github.com/jpatota/rockme-wordlist
https://patota.io/rockyou-is-dead