https://github.com/sjquant/orgpulse
Generate GitHub organization metrics snapshots, repo rollups, and summaries.
https://github.com/sjquant/orgpulse
Last synced: about 2 months ago
JSON representation
Generate GitHub organization metrics snapshots, repo rollups, and summaries.
- Host: GitHub
- URL: https://github.com/sjquant/orgpulse
- Owner: sjquant
- Created: 2026-04-15T13:35:33.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-27T15:08:02.000Z (about 2 months ago)
- Last Synced: 2026-04-27T15:23:23.032Z (about 2 months ago)
- Language: Python
- Size: 436 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# orgpulse
GitHub organization metrics snapshots and rollups.
`orgpulse` collects pull request activity across every repository in a GitHub
organization, builds repo-level and org-level summaries, and writes stable file
outputs for historical tracking.
## What It Does
- Collects PR, review, and merge data across an organization
- Generates repo-level and org-level metrics
- Writes normalized raw snapshots plus summary outputs to files
- Locks closed reporting periods so historical results stay stable
- Supports `full`, `incremental`, and `backfill` run modes
## Prerequisites
- Python `3.11+`
- `uv`
- GitHub credentials that can read the target organization
Install the runtime dependencies from the repo root:
```bash
uv sync
```
Install contributor tooling if you also want linting and tests:
```bash
uv sync --group dev
```
## Authentication
`orgpulse` resolves GitHub credentials in this order:
1. `GH_TOKEN`
2. `gh auth token` from an existing GitHub CLI login
Example with an environment token:
```bash
export GH_TOKEN=ghp_your_token_here
uv run orgpulse run --org acme
```
Example with GitHub CLI auth:
```bash
gh auth login
uv run orgpulse run --org acme
```
Before collection starts, `orgpulse` validates that the resolved credentials can
access the target organization.
## CLI Contract
`orgpulse run` accepts these operator-facing options:
- `--org `: target GitHub organization. Falls back to `ORGPULSE_ORG`.
- `--as-of `: anchor date used to resolve the current open reporting
period. Falls back to `ORGPULSE_AS_OF` or today.
- `--period `: reporting grain. Falls back to `ORGPULSE_PERIOD`.
- `--mode `: run strategy. Falls back to
`ORGPULSE_MODE`.
- `--repo `: include only matching repositories. Repeatable.
- `--exclude-repo `: exclude matching repositories.
Repeatable.
- `--output-dir `: output root. Falls back to `ORGPULSE_OUTPUT_DIR`.
- `--backfill-start ` and `--backfill-end `: required
together for `--mode backfill`.
Notes:
- `--repo` and `--exclude-repo` cannot overlap.
- If a repo filter is owner-qualified, its owner must match `--org`.
- `orgpulse` writes a JSON run summary to stdout and writes period files under
`--output-dir`.
## Run Modes
### Incremental
`incremental` is the default and is the normal operator mode.
- Refreshes only the current open period.
- Reuses locked closed periods from the existing manifest when the run contract
matches the same org, period grain, repo filters, and output root.
- Leaves locked historical raw snapshots and summaries untouched.
- Promotes a previously refreshed open period into locked history after that
period closes on a later run.
Example:
```bash
uv run orgpulse run \
--org acme \
--period month \
--mode incremental \
--as-of 2026-04-18 \
--output-dir output
```
### Full
`full` rebuilds the full discovered history up to `--as-of`.
- Ignores locked-period skipping.
- Rewrites refreshed periods from scratch.
- Prunes stale period directories that no longer belong to the rebuilt history.
- Use it when you want to replace the current snapshot set instead of preserving
prior locked history.
Example:
```bash
uv run orgpulse run \
--org acme \
--period month \
--mode full \
--as-of 2026-04-18 \
--output-dir output
```
### Backfill
`backfill` recalculates an explicit closed-period range without rebuilding the
entire history.
- Requires both `--backfill-start` and `--backfill-end`.
- Both dates must align to the selected period boundary.
- The backfill end date must be before the current open period begins, as
defined by `--as-of`.
- Rewrites only the requested closed periods and preserves unrelated locked
history.
- Writes header-only raw CSVs and zero-valued summaries when a requested period
has no matching pull requests.
Monthly backfill example:
```bash
uv run orgpulse run \
--org acme \
--period month \
--mode backfill \
--as-of 2026-05-18 \
--backfill-start 2026-03-01 \
--backfill-end 2026-04-30 \
--output-dir output
```
Weekly backfill example:
```bash
uv run orgpulse run \
--org acme \
--period week \
--mode backfill \
--as-of 2026-05-18 \
--backfill-start 2026-04-06 \
--backfill-end 2026-04-19 \
--output-dir output
```
## Practical Operator Workflow
Use `incremental` for routine scheduled runs. Use `backfill` when you need to
repair or refresh one or more closed periods. Use `full` when you intentionally
want to replace the currently materialized history.
Examples with repo filters:
```bash
uv run orgpulse run \
--org acme \
--mode incremental \
--repo api \
--repo web \
--exclude-repo legacy \
--output-dir output
```
Environment-backed defaults can remove repeated flags:
```bash
export ORGPULSE_ORG=acme
export ORGPULSE_PERIOD=month
export ORGPULSE_MODE=incremental
export ORGPULSE_OUTPUT_DIR=output
uv run orgpulse run --as-of 2026-04-18
```
## Locked-Period Behavior
`orgpulse` treats the current open period as mutable and closed periods as
stable history.
- Incremental runs skip locked periods and refresh only the open period.
- Full and backfill runs refresh locked periods instead of skipping them.
- Closed periods become locked after a successful run.
- Locked periods are carried forward only when the saved manifest still matches
the same org, period grain, repo filters, and raw snapshot root.
- If the saved manifest contract does not match, `orgpulse` does not reuse those
historical locks.
This keeps normal runs diff-friendly while still allowing explicit historical
repair when needed.
## Output Layout
For `--output-dir output --period month`, the generated layout is:
```text
output/
raw/month/
2026-04/
pull_requests.csv
pull_request_reviews.csv
pull_request_timeline_events.csv
manifest/month/
manifest.json
index.json
README.md
repo_summary/month/
contract.json
index.json
README.md
latest/
repo_summary.csv
2026-04/
repo_summary.csv
org_summary/month/
contract.json
index.json
README.md
latest/
summary.json
summary.md
2026-04/
summary.json
summary.md
```
## Local Analysis
`orgpulse analyze` reads the local snapshot and manifest outputs and builds
focused analysis views without refetching GitHub data.
- Supports `period`, `repository`, and `author` groupings
- Respects `--since`, `--until`, `--time-anchor`, `--top`, and
`--distribution-percentile`
- Writes JSON, CSV, Markdown, or interactive HTML to stdout
- Trims upper-tail outliers from distribution-based metrics with
`--distribution-percentile 95|99|100` where `100` keeps all values
- HTML output includes shared controls, single-series focus mode, and spike
diagnostics such as same-period-created ratio, older-PR ratio, top
contributing repositories, top updated dates, and timeline-event breakdowns
`orgpulse dashboard` reads local `month/created_at` outputs and renders the
supported dashboard view as JSON, per-PR CSV, and interactive HTML files.
- Reads local manifest-backed raw snapshots instead of refetching full history
- Currently supports only `month` grain with the `created_at` anchor
- Respects `--since`, `--until`, `--distribution-percentile`, and
`--refresh/--no-refresh`
- Writes dashboard artifacts under an explicit `--output-dir`
- Reuses the saved local manifest contract from `--source-output-dir`
`orgpulse dashboard-render` re-renders HTML from an existing dashboard JSON
payload without requiring manifest-backed raw snapshots.
Example HTML analysis:
```bash
uv run orgpulse analyze \
--org acme \
--grain month \
--group-by repository \
--time-anchor updated_at \
--since 2026-04-01 \
--until 2026-04-30 \
--distribution-percentile 95 \
--format html \
--output-dir output > analysis.html
```
Example dashboard render:
```bash
uv run orgpulse dashboard \
--org acme \
--since 2026-01-01 \
--until 2026-04-27 \
--source-output-dir output \
--output-dir output/acme-review/manual-2026-04-27 \
--distribution-percentile 99
```
Example render-only refresh:
```bash
uv run orgpulse dashboard-render \
--input-json output/acme-review/manual-2026-04-27/acme-created-at-since-2026-01-01.json \
--output-html output/acme-review/manual-2026-04-27/acme-created-at-since-2026-01-01.html \
--distribution-percentile 99
```
### Raw snapshots
- `raw///pull_requests.csv`: normalized PR rows
- `raw///pull_request_reviews.csv`: normalized review rows
- `raw///pull_request_timeline_events.csv`: timeline events used
for review timing and state transitions
### Manifest
- `manifest.json`: latest run metadata, including refreshed periods, locked
periods, and watermarks
- `index.json`: machine-readable index with the latest run metadata plus history
lists for refreshed and locked periods
- `README.md`: human-readable manifest index
The manifest watermarks track:
- `collection_window_start_date`
- `collection_window_end_date`
- `latest_refreshed_period_end_date`
- `latest_locked_period_end_date`
### Repo summaries
- `/repo_summary.csv`: repo-level rollup for that period
- `latest/repo_summary.csv`: convenience copy of the newest period summary
- `index.json`: machine-readable map of the latest summary and all saved history
- `README.md`: human-readable history table
- `contract.json`: run contract for the saved summary set
### Org summaries
- `/summary.json`: machine-readable org rollup for that period
- `/summary.md`: human-readable org summary for that period
- `latest/summary.json` and `latest/summary.md`: convenience copies of the
newest period outputs
- `index.json`: machine-readable map of the latest summary pair and all saved
history
- `README.md`: human-readable history table
- `contract.json`: run contract for the saved summary set
## Latest and Index Files
Use the `latest` files when downstream automation only needs the newest summary
without first resolving a period key.
Use the `index.json` files when automation needs:
- the newest available period key
- the source path behind the `latest` copy
- the full saved history for the current contract
- the manifest's latest run metadata and watermarks
Use the generated `README.md` files when a human operator needs to inspect the
same catalog without opening JSON directly.
## Current Metrics
- Merged PR count
- Time to merge
- Time to first review
- PR size
- Commit count
- Active author count
- Merged PR per active author