Brief summary of the A/B test harness under tests/ab/. Detail
intentionally not duplicated from individual port test READMEs — read
tests/ab/run-all.sh and the per-port run.sh files for the source
of truth.
The harness can run in two modes:
- A/B mode (
tests/ab/run-all.sh) — runs each port's legacy PHP/Perl/Bash script and the newhzmetrics.pyequivalent side by side, diffs every output table. Requirestests/legacy/to be present. - Golden mode (
tests/ab/run-all-golden.sh) — runs only the new code, diffs against a frozen snapshot of legacy output captured at parity time (tests/ab/port_*/golden/*.tsv). Does not requiretests/legacy/. Simulates the world where the legacy reference has been removed. - Defensive mode (
tests/ab/run-defensive.sh) — runs the new-code-only tests that do not need legacy or golden snapshots: fuzz, idempotency, dry-run safety, empty input, determinism, cross-table invariants, and CLI error contracts.
The A/B and golden modes produce the same pass/fail outcome on a current codebase. CI runs golden plus defensive mode.
45 test directories under tests/ab/port_*:
Per-port A/B (16): port_andmore_usage, port_clean_bots,
port_fill_domain, port_fill_ipcountry, port_fill_user_info,
port_gen_tool_stats, port_gen_tool_toplists,
port_gen_tool_tops, port_identify_bots, port_import_apache,
port_import_auth, port_import_hub_data, port_import_webhits,
port_logfix_session, port_middleware, port_whoisonline.
Integration (2): port_pipeline (full analyze + summarize chain
on synthetic data), port_realdata (same chain on a captured
production-data slice — gated by snapshot presence).
Coverage tests (3): port_summarize_month (the most
metric-dense single port), port_period_sweep (24 anchor-port
combinations exercising period boundary arithmetic),
port_invariants (cross-table rules like
summary_user_vals[rowid=1] = SUM([6,7,8])).
Defensive tests (6): port_fuzz (4 fuzz harnesses with 2000+
randomized cases each), port_idempotency (re-runs analyze+summarize
on the same DB), port_dryrun (every --dry-run writes zero rows),
port_empty_input (each port no-ops cleanly on empty input),
port_determinism (two fresh-DB runs are byte-identical),
port_cli_contracts (invalid CLI/config paths exit non-zero).
Orchestration (5): port_discovery (source-log enumeration
across daily/, daily/YYYY/, daily.holding/),
port_state (DB-backed pipeline_state read/write + file→DB
bootstrap), port_decisions (the three Phase-C decision helpers +
every row of the catchup decision matrix), port_cmd_run
(three-mode state machine: mode dispatch + transitions + per-month
routing via monkey-patched DB), port_rebuild_summaries
(the manual-range CLI + extended status output).
Catchup correctness (2): port_periods_filter (do_summarize(periods=(1,))
writes exactly the period-1 grid and zero rows in any other period;
inverse pass with periods=None populates all six),
port_rebuild_correctness (loads month M2 fully summarized; adds
month M1 rows; resummarizes M2; asserts period-14 refreshed to
include M1 while period-1 stays unchanged — the core promise of
rebuild mode).
Install + crash recovery (4): port_bootstrap (_self_bootstrap
identity gate + site-name guard + _expected_dirs contract + the
init / doctor exit codes), port_import_atomic (per-file
import is transactional and the imported_sources marker survives
post-COMMIT crashes — forget-import reverses both halves cleanly),
port_lock (PID-file format and init_start_epoch stale-PID
detection across reboot / container restart), port_month_complete
(data-driven month-closed check that gates logfix-session to month
boundary).
Filter regression guards (6): port_dnload_classify (Python
_is_download_url covers every download-extension and download-path
shape), port_dnload_backfill_regex (SQL-side backfill-dnload regex
correctly handles literal-dot vs any-char — pins the silent fix in
db5d8ba), port_referer_spam (login/?return=, resources/browse?,
citations/browse, and /register empty-Referer crawler-spam regexes),
port_msie_filter (date-bound MSIE-Trident UA regex + watermark + the
import-apache wiring source-grep), port_crawl_filters_2026
(/register Referer-gating + date-bound /events// filter that
measures from the log line's datestamp, not date.today()),
port_session_split (1800-second session boundary).
Window-boundary semantics (1): port_window_boundaries (27
assertions: period range arithmetic across month / quarter / year /
fiscal-year boundaries, leap years, DST edges).
- MariaDB running locally; an account with
CREATE DATABASE/GRANTprivileges for the bootstrap step (typically viasudo mysqlusing the system socket auth). - PHP CLI on
PATH— the legacy reference undertests/legacy/shells out tophp,perl, andbash. - The BIND
host(1)utility — the legacy DNS step (xlogfix_dns_v2.shxlogfix_dns_worker.php) shells out to/usr/bin/host. On Debian/Ubuntu:sudo apt install bind9-host. Without it, 3 DNS-dependent tests (port_pipeline,port_determinism,port_whoisonline) fail with fake mismatches where legacy reports?/(unknown)while the new Python's aiodns resolves cleanly.
- Python runtime deps from
pyproject.toml(pymysql,aiodns). tests/ab/fixtures/test_access.cfgmust name a real local DB user. The committed sample leaves$db_user = ''; either patch a temporary cfg and pointHZMETRICS_ACCESS_CFGat it, or patch the fixture in a disposable checkout the way CI does.
# Bootstrap once per host (creates test DBs, loads reference data)
tests/ab/setup_test_dbs.sh --bootstrap
# Run the full A/B suite
tests/ab/run-all.sh
# Or the golden-mode round (no legacy needed)
tests/ab/run-all-golden.sh
# New-code-only defensive checks (also no legacy needed)
tests/ab/run-defensive.sh
# Run a single port
tests/ab/port_fill_domain/run.sh
tests/ab/port_fill_domain/run_golden.shsetup_test_dbs.sh --reset truncates everything and reloads
reference data — used between tests. Top-level drivers report
pass/fail/skip; a per-port runner can exit 77 to mark a real skip
(currently used when the optional production snapshot is absent).
Both setup_test_dbs.sh and conftest.sh honor
HZMETRICS_ACCESS_CFG=<path> (env override). The bootstrap reads
hub_db, metrics_db, db_host, db_user, and db_pass from that
cfg, creates the named test DBs, creates the DB user if needed, and
grants it access. TEST_USER is accepted only as a consistency
override; it must match the cfg's $db_user so mysql and
hzmetrics.py connect as the same account.
Real bugs surfaced during the port, with their commit messages preserved in the log for reference:
fill-domainday-before-month-start — legacyfindWeeks()starts a week-chunk on the day BEFORE the month begins (so2025-06-30 23:59:00belongs to July 2025's first chunk). Caught byport_fill_domain; commitA/B test: fill-domain — caught & fixed day-before-month-start divergence.xlogfix_middleware_cpu.pl— four real divergences in one commit (A/B test: middleware-{wall,cpu} — caught three real divergences):- MariaDB
ROUND()is banker's rounding, Perlint($x + 0.5)is round-half-up → fixed toFLOOR(x + 0.5). cpu.plonly UPDATEs existing toolstart rows, never INSERTs (the wall version does both).cpu.pl's UPDATE check is<= 0(includescputime=0), not< 0.cpu.pldoes not filterjoblog.event = '[waiting]'; wall does. Caught when both ports were initially symmetric.
- MariaDB
andmore-usagedatetime suffix — legacy stores'-01', summarize uses'-00'; new port was using'-00'for both. CommitA/B test: andmore-usage — caught datetime suffix divergence.logfix-sessioncross-week state — Perl declares session state vars at script scope, so an in-flight session persists across the 4 week-chunks of a month. The Python port initially reset state per chunk. CommitA/B test: logfix-session — caught cross-week state divergence.summarize-monthreg_users col=1 missing-JOIN — legacy queriesuserlogin_litedirectly (no JOIN) for col=1; my port was unconditionally joiningjos_xprofiles_metricsfor every col, so whenxprofiles_metricsis empty it under-counted. CommitA/B test: summarize-month — caught reg_users col=1 missing-JOIN divergence.import-authbracket-strip —[user[sub]]should produceuser, notuser[sub]. PHPltrim($x, '[') + rtrim($x, ']')use charlist semantics (strip ALL leading[and trailing]); the port's regex was capturing the inner-bracketed content literally. CommitA/B: deepen 5 fixtures — caught import-auth bracket-strip bug.gen-tool-statsfloat→int rounding — Python float bound as numeric literal hits MariaDB's banker's rounding; PHP stringifies first and hits half-away-from-zero.488.5 → 488vs488.5 → 489. Fix: stringify floats before binding. CommitA/B: deepen 5 more fixtures, caught gen-tool-stats float→int rounding bug.download_usersrowid=4 vs rowid=8 filter mismatch — the two rowids use DIFFERENT WHERE filters in legacy (rowid=4 doesn't excludelogin_ipsor capduration < 900), but my port was reusingdl_users_period_tmpbuilt for rowid=8. Caught by deepeningport_summarize_month's fixture with a registered-user downloader. CommitA/B: deepen summarize-month, caught download_users rowid=4 filter mismatch.summary_misc_valsrowid=3 NULL handling —SUM(duration)returns NULL on an empty period; legacydb_fetchreturns NULL →dbquote(NULL)writes empty string; the port coerced to0. Caught byport_period_sweepat anchor months with no data. CommitA/B: period sweep test + fix misc_usage NULL → empty- string parity.
Plus the "A/B re-baseline" commit (Roll back dnload-at-import and action-filter from hzmetrics.py) — the most important harness
catch. An initial legacy snapshot included two post-aa245f7
behaviors that had been absorbed into the new port: import-apache
setting dnload=1 inline, and import-auth filtering
action IN ('login','simulation') at insert time. Re-baselining
the harness against the true pre-refactor snapshot revealed that
the port had unintentionally inherited those changes; both got
rolled back. This is the divergence the docs talk about under
"bug-for-bug parity is hard to verify when your baseline is wrong."
Documented in commit history under A/B test: <port> — caught …
and A/B: … messages.
port_realdata requires a captured production-data snapshot
(tests/ab/port_realdata/snapshot/*.sql.gz). The snapshot directory
is gitignored because the raw data contains real usernames, emails,
and IPs; the test skips gracefully when the snapshot isn't present.
See tests/ab/port_realdata/capture.sh for how to capture one when
you have read access to a production database.
Some tests touch network resources (fill-ipcountry hits
help.hubzero.org/ipinfo/v1, resolve-dns uses the local resolver
which forwards out). These work fine offline against the cached
results in tests/ab/fixtures/, but require network for fresh data.