https://github.com/wan-huiyan/dashboard-audit-toolkit
Synchronous Claude Code toolkit for auditing live data dashboards end-to-end — parallel cluster agents methodology + the most common fix-shape + single-metric depth audit + the squash-merge gotcha. Sister to wan-huiyan/overnight-workflows.
https://github.com/wan-huiyan/dashboard-audit-toolkit
claude-code claude-code-plugin claude-code-skill dashboard-audit data-dashboard fact-checking language-audit parallel-agents
Last synced: 7 days ago
JSON representation
Synchronous Claude Code toolkit for auditing live data dashboards end-to-end — parallel cluster agents methodology + the most common fix-shape + single-metric depth audit + the squash-merge gotcha. Sister to wan-huiyan/overnight-workflows.
- Host: GitHub
- URL: https://github.com/wan-huiyan/dashboard-audit-toolkit
- Owner: wan-huiyan
- License: mit
- Created: 2026-05-08T09:52:42.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-15T22:09:28.000Z (about 1 month ago)
- Last Synced: 2026-05-28T17:27:09.008Z (about 1 month ago)
- Topics: claude-code, claude-code-plugin, claude-code-skill, dashboard-audit, data-dashboard, fact-checking, language-audit, parallel-agents
- Size: 24.4 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Dashboard Audit Toolkit
A synchronous toolkit of four [Claude Code](https://claude.com/claude-code) skills for **auditing a live data dashboard end-to-end and shipping the fixes** in a single ~30–45-minute round, without overnight runs or autonomous loops.
[](LICENSE)
[](https://github.com/wan-huiyan/dashboard-audit-toolkit/commits)
[](https://claude.com/claude-code)
## The four plugins
| Plugin | Role |
|---|---|
| [**whole-dashboard-factcheck-via-parallel-cluster-agents**](plugins/whole-dashboard-factcheck-via-parallel-cluster-agents/) | The methodology spine. Cluster a 6+ page dashboard by audience-purpose (3–4 clusters), dispatch one research subagent per cluster, synthesise into a single audit report with TL;DR + per-page detail + "what's solid" credibility anchor. |
| [**template-hardcoded-literal-vs-existing-payload**](plugins/template-hardcoded-literal-vs-existing-payload/) | The most common fix-shape produced by the audit: the page renders `~930` while the backend payload (which has been correct for months) says `94`. Diagnostic + zero-data-pipeline-change fix. |
| [**dashboard-metric-label-vs-sql-definition**](plugins/dashboard-metric-label-vs-sql-definition/) | Single-metric depth audit. When a stakeholder pushes back on what a KPI "means," read the SQL — not the rendered label, the variable name, or the column alias. |
| [**gh-squash-merge-closes-only-one-issue**](plugins/gh-squash-merge-closes-only-one-issue/) | Operational gotcha when running the audit→issues→fix-bundles flow at scale: GitHub squash-merge auto-closes only ONE issue per PR. Path A prevents; Path B recovers. |
These skills are designed as a set. The methodology spine generates the audit; the fix-shape skill resolves most of what it finds; the depth audit handles single-metric pushbacks; and the GitHub gotcha makes shipping the fixes mechanical.
## When to reach for this toolkit
- You have a **live data dashboard** (BigQuery, Postgres, JSON cache, baked-payload table) with **6+ user-facing pages** and a stated voice/persona doc.
- A **leadership review, client demo, or Phase 2 hand-over** is approaching, and you want confidence the rendered numbers match the source database AND the language lands for the stated audiences.
- You can sit synchronously for **30–45 minutes** of audit + a few hours of follow-up fixes — this is NOT an overnight workflow.
If your situation is different, see **Related repos** below.
## Why use these
A naive "review this dashboard" prompt produces a generic-sounding bullet list that misses the load-bearing problems. The most common failure modes:
- **Hardcoded literals next to correct payloads.** The template renders `~930 students` next to a baked payload that says `cohort_size=94`. Counselors export "930 to call" and Salesforce shows 94. Silent. No error. No broken build.
- **Display labels diverge from SQL definitions.** "Active Students" sounds like "students with recent activity" but the SQL is `COUNT(*) FROM predictions WHERE scoring_date = MAX(scoring_date) AND term_suppressed IS NULL` — i.e. "rows in today's scoring run, post-suppression." Two different things.
- **Audit shipped, but issues stay open.** PR squash-merges with `Closes #447, #448, #449, #450`. Only `#447` closes. Three confused minutes later, you realise GitHub's parser binds one keyword to one issue.
- **Single-agent shallow coverage.** Asking one agent to cover 14 pages produces shallow output and runs out of context. Asking page-by-page synchronously takes hours.
These plugins encode the patterns that fix each of these — extracted from real audits that caught real cohort-size and language errors before they reached real stakeholders.
## Core patterns (shared across the toolkit)
### 1. Cluster split by audience-purpose
Don't cluster by URL prefix. Cluster by **what each page is for** — typically `Headline / executive`, `Operational / action surfaces`, `Library / methodology / explorer`. Three clusters is the sweet spot for 12–16 pages; four for larger surfaces.
### 2. Diagnostic table schema
The load-bearing artefact is a per-page table:
| Element | Source | Claimed | Actual (DB) | Verdict |
|---|---|---|---|---|
| ... | payload-derived / hardcoded / mock-data / route-computed | what page renders | what BQ says | ✅ / ⚠️ / ❌ |
This schema makes mismatches scannable AND makes "what to fix first" mechanical. Reused across all four plugins.
### 3. Audit → issues → fix-bundles flow
Ship the audit doc as **PR-1 (report-only, docs label)** so it lands fast. File **one GitHub issue per P0 item** with labels at creation. Bundle fixes into **2–4 PRs** by *kind* (wiring fixes, jargon cleanup, copy edits) — not one mega-PR and not N mini-PRs. Each fix PR closes its bundle of issues with one keyword per issue (`Closes #X. Closes #Y.` — never `Closes #X, #Y`).
### 4. Fix in the consumer, not the producer
The most common fact-check finding produced by this audit is **the consumer never connected to a correct producer**. Don't reach for the bake / data pipeline / source-of-truth table. Add a `{{ variable }}` accessor in the template, thread it through the route handler, and ship in 30 LoC.
### 5. Read the SQL, not the label
When a stakeholder pushes back on what a KPI means, open the query. Quote the aggregation (`COUNT(*)`, `SUM`) and the **complete** `WHERE` clause. Both shape the meaning. Note every UI toggle that silently re-parametrises the count.
### 6. "What's solid" credibility anchor
Pure-negative reports erode trust in the audit's calibration. Always include a "What's solid" section ("baked KPIs match BQ exactly", "diagnostics floor logic correct", "AI Insights numbers are arithmetically right — only the labels are wrong") so the negative finds land credibly.
## Installation
```bash
# Add the marketplace
/plugin marketplace add wan-huiyan/dashboard-audit-toolkit
# Install one or all four plugins
/plugin install whole-dashboard-factcheck-via-parallel-cluster-agents@wan-huiyan-dashboard-audit-toolkit
/plugin install template-hardcoded-literal-vs-existing-payload@wan-huiyan-dashboard-audit-toolkit
/plugin install dashboard-metric-label-vs-sql-definition@wan-huiyan-dashboard-audit-toolkit
/plugin install gh-squash-merge-closes-only-one-issue@wan-huiyan-dashboard-audit-toolkit
```
Or clone directly:
```bash
git clone https://github.com/wan-huiyan/dashboard-audit-toolkit.git
cp -R dashboard-audit-toolkit/plugins/whole-dashboard-factcheck-via-parallel-cluster-agents ~/.claude/skills/
cp -R dashboard-audit-toolkit/plugins/template-hardcoded-literal-vs-existing-payload ~/.claude/skills/
cp -R dashboard-audit-toolkit/plugins/dashboard-metric-label-vs-sql-definition ~/.claude/skills/
cp -R dashboard-audit-toolkit/plugins/gh-squash-merge-closes-only-one-issue ~/.claude/skills/
```
## Quick start
> "Review this dashboard end-to-end. Fact-check the rendered numbers against the BigQuery payload table, and audit the language for `PRODUCT.md`'s two stated audiences. Three clusters: headline / operational / library. Report only — I'll triage what to fix."
The methodology-spine plugin will:
1. Map the user-facing routes + payload table + source-of-truth tables.
2. Cluster pages by audience-purpose (3–4 clusters).
3. Dispatch one research subagent per cluster in parallel — same prompt template, same diagnostic-table output schema.
4. Synthesise into a single Markdown audit report at `docs/reviews/__factcheck_language_audit.md`.
Once you triage, the other three plugins handle most of what the audit surfaces:
- Most fact-check P0s → `template-hardcoded-literal-vs-existing-payload`
- Stakeholder pushback on what a metric "means" → `dashboard-metric-label-vs-sql-definition`
- Issues that won't close after fix-PRs merge → `gh-squash-merge-closes-only-one-issue`
## Related repos
This toolkit is for **synchronous** audits — one ~30–45 min round of parallel cluster agents, with you in the loop. For different shapes:
- **Autonomous overnight runs** — [`wan-huiyan/overnight-workflows`](https://github.com/wan-huiyan/overnight-workflows). Two sister plugins:
- `overnight-review-client-delivery` — you already have a client deliverable that needs polishing + quality-gating before a morning hand-off. 8-agent review panel.
- `overnight-insight-discovery` — generate a client-facing insight brief from scratch, surfacing funnel leaks and surprise patterns from data.
- **Per-instance SHAP waterfall + post-hoc calibrator** — [`wan-huiyan/shap-waterfall-calibrator-skill`](https://github.com/wan-huiyan/shap-waterfall-calibrator-skill). One of the more common per-instance explainability bugs an audit will surface: the chart's Final bar disagrees with the score chip beside it, and the "rescale to fit" fix introduces sign-flip artifacts on a different slice of instances. The correct fix (point-by-point calibration of the cumulative path) and the SHAP-maintainer caveats.
- **Single-page sanitize for client send** — `client-readiness-pass-on-rendered-html` (separate skill, not in this toolkit). Different shape: one page (or a small subset) inside an internal-flavored generated site, sanitised so it can be sent directly to a non-technical client without rebuilding the generator.
- **Plan / requirements docs** — `document-review` (separate skill). Persona-agent review of plans/requirements rather than cluster agents over a live product.
- **Adversarial debate on a design** — `agent-review-panel` (separate skill). Multi-agent debate with a Supreme Judge.
## License
MIT — see [LICENSE](LICENSE).