Apache logs → MariaDB analytics, in one Python file.
hzmetrics.py is the analytics pipeline for a HUBzero-based science
gateway. It ingests Apache access logs and CMS authentication logs,
enriches them (reverse DNS, domain classification, GeoIP, session
coalescing), and produces monthly summary statistics in a MariaDB
metrics database. Those statistics drive the hub's usage reporting
pages and grant reporting.
One Python file (~8000 lines) replaces the decade-plus accumulation
of PHP, Perl, and Bash scripts that previously lived at
/opt/hubzero/bin/metrics/. The legacy reference implementation is
preserved verbatim under tests/legacy/ and is the
bug-for-bug parity target the A/B test harness compares against.
# 1. Deps + /opt tree + scripts (root; idempotent).
sudo make install
# 2. Drop the unified per-tenant config in place (DB creds + DNS settings).
sudo install -o apache -g apache -m 0600 hzmetrics.conf \
/opt/hubzero/metrics/conf/hzmetrics.conf
# 3. Create the metrics DB, run baseline DDL, apply migrations.
sudo -u apache python3 /opt/hubzero/metrics/bin/hzmetrics.py init
# 4. Confirm everything is healthy.
sudo -u apache python3 /opt/hubzero/metrics/bin/hzmetrics.py doctor
# 5. Register the cron line.
sudo -u apache crontab /opt/hubzero/metrics/conf/hzmetrics.cron.apache.samplemake install, init, and doctor are idempotent. The same init
machinery also runs automatically on the first cron tick when invoked
as apache / www-data, so if you skip step 3 the next tick will
catch up — see
docs/architecture.md → Self-bootstrap.
The cron line is one entry, every five minutes:
*/5 * * * * python3 /opt/hubzero/metrics/bin/hzmetrics.py tick
tick refreshes the whoisonline map every invocation; at :30 past
each hour it also opportunistically runs the metrics pipeline under a
PID lock. The pipeline is a three-mode state machine (normal,
catchup, rebuild) — a multi-year backlog drains autonomously
without operator intervention.
For everything else, hzmetrics.py --help and the
full documentation.
.
├── hzmetrics.py the entire pipeline
├── Makefile install / uninstall / test / lint
├── conf/ templates: hzmetrics.conf.sample, cron
├── docs/ plain-markdown documentation
├── gh-pages/ static-site templates + builder
└── tests/
├── legacy/ pre-rewrite PHP/Perl/Bash baseline
└── ab/ A/B + golden + defensive harness
(44 ports — see docs/testing.md)
Start at docs/README.md (or the
rendered site). Most-touched
operational pages:
docs/deployment.md— install, cron, logrotate, hzmetrics.conf.docs/operations.md— runbook: catch-up, stuck lock, bot inflation, DNS issues, crash recovery, ANALYZE TABLE, etc.docs/architecture.md— pipeline phases, tables, scheduling, the catchup state machine, self-bootstrap.docs/testing.md— A/B + golden + defensive test modes.
The HUBzero metrics subsystem was originally written in Perl by Swaroop Shivarajapura and later ported to PHP by Nicholas J. Kisseberth. Long-term stewardship of the codebase has been carried by J.M. Sperhac (SDSC), among others. This Python rewrite builds directly on their work.