https://github.com/hubzero/hzmetrics
HUBzero metrics pipeline — Python rewrite of the legacy PHP/Perl/Bash pipeline
https://github.com/hubzero/hzmetrics
Last synced: 3 days ago
JSON representation
HUBzero metrics pipeline — Python rewrite of the legacy PHP/Perl/Bash pipeline
- Host: GitHub
- URL: https://github.com/hubzero/hzmetrics
- Owner: hubzero
- License: mit
- Created: 2026-05-15T23:59:06.000Z (21 days ago)
- Default Branch: main
- Last Pushed: 2026-05-31T21:41:57.000Z (5 days ago)
- Last Synced: 2026-05-31T22:14:11.093Z (5 days ago)
- Language: Python
- Homepage: https://hubzero.github.io/hzmetrics/
- Size: 3.48 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Roadmap: docs/roadmap.md
Awesome Lists containing this project
README
Hubzero Metrics Pipeline
Apache logs → MariaDB analytics, in one Python file.
---
`hzmetrics.py` is the analytics pipeline for a HUBzero-based science
gateway. It ingests Apache access logs and CMS authentication logs,
enriches them (reverse DNS, domain classification, GeoIP, session
coalescing), and produces monthly summary statistics in a MariaDB
metrics database. Those statistics drive the hub's usage reporting
pages and grant reporting.
One Python file (~8000 lines) replaces the decade-plus accumulation
of PHP, Perl, and Bash scripts that previously lived at
`/opt/hubzero/bin/metrics/`. The legacy reference implementation is
preserved verbatim under [`tests/legacy/`](tests/legacy/) and is the
bug-for-bug parity target the A/B test harness compares against.
## Quickstart
```sh
# 1. Deps + /opt tree + scripts (root; idempotent).
sudo make install
# 2. Drop the unified per-tenant config in place (DB creds + DNS settings).
sudo install -o apache -g apache -m 0600 hzmetrics.conf \
/opt/hubzero/metrics/conf/hzmetrics.conf
# 3. Create the metrics DB, run baseline DDL, apply migrations.
sudo -u apache python3 /opt/hubzero/metrics/bin/hzmetrics.py init
# 4. Confirm everything is healthy.
sudo -u apache python3 /opt/hubzero/metrics/bin/hzmetrics.py doctor
# 5. Register the cron line.
sudo -u apache crontab /opt/hubzero/metrics/conf/hzmetrics.cron.apache.sample
```
`make install`, `init`, and `doctor` are idempotent. The same `init`
machinery also runs automatically on the first cron tick when invoked
as `apache` / `www-data`, so if you skip step 3 the next tick will
catch up — see
[`docs/architecture.md → Self-bootstrap`](docs/architecture.md#self-bootstrap).
The cron line is one entry, every five minutes:
```
*/5 * * * * python3 /opt/hubzero/metrics/bin/hzmetrics.py tick
```
`tick` refreshes the whoisonline map every invocation; at `:30` past
each hour it also opportunistically runs the metrics pipeline under a
PID lock. The pipeline is a three-mode state machine (`normal`,
`catchup`, `rebuild`) — a multi-year backlog drains autonomously
without operator intervention.
For everything else, `hzmetrics.py --help` and the
[full documentation](https://hubzero.github.io/hzmetrics/).
## Source layout
```
.
├── hzmetrics.py the entire pipeline
├── Makefile install / uninstall / test / lint
├── conf/ templates: hzmetrics.conf.sample, cron
├── docs/ plain-markdown documentation
├── gh-pages/ static-site templates + builder
└── tests/
├── legacy/ pre-rewrite PHP/Perl/Bash baseline
└── ab/ A/B + golden + defensive harness
(44 ports — see docs/testing.md)
```
## Documentation
Start at [`docs/README.md`](docs/README.md) (or the
[rendered site](https://hubzero.github.io/hzmetrics/)). Most-touched
operational pages:
- [`docs/deployment.md`](docs/deployment.md) — install, cron,
logrotate, hzmetrics.conf.
- [`docs/operations.md`](docs/operations.md) — runbook: catch-up,
stuck lock, bot inflation, DNS issues, crash recovery,
ANALYZE TABLE, etc.
- [`docs/architecture.md`](docs/architecture.md) — pipeline phases,
tables, scheduling, the catchup state machine, self-bootstrap.
- [`docs/testing.md`](docs/testing.md) — A/B + golden + defensive
test modes.
## Acknowledgments
The HUBzero metrics subsystem was originally written in Perl by
Swaroop Shivarajapura and later ported to PHP by Nicholas J.
Kisseberth. Long-term stewardship of the codebase has been carried
by J.M. Sperhac (SDSC), among others. This Python rewrite builds
directly on their work.