https://github.com/mizcausevic-dev/procurement-pulse-engine
The crawl + aggregate engine behind the AI Procurement Pulse. Probes a universe of vendor domains for the 11 Kinetic Gain Protocol Suite documents and produces the quarterly issue dataset. Issue #1: the zero baseline.
https://github.com/mizcausevic-dev/procurement-pulse-engine
ai-governance ai-procurement-pulse crawler data-journalism javascript kinetic-gain-protocol-suite procurement research well-known
Last synced: 2 days ago
JSON representation
The crawl + aggregate engine behind the AI Procurement Pulse. Probes a universe of vendor domains for the 11 Kinetic Gain Protocol Suite documents and produces the quarterly issue dataset. Issue #1: the zero baseline.
- Host: GitHub
- URL: https://github.com/mizcausevic-dev/procurement-pulse-engine
- Owner: mizcausevic-dev
- License: mit
- Created: 2026-05-21T17:53:52.000Z (13 days ago)
- Default Branch: main
- Last Pushed: 2026-05-28T00:10:18.000Z (6 days ago)
- Last Synced: 2026-05-28T02:16:11.977Z (6 days ago)
- Topics: ai-governance, ai-procurement-pulse, crawler, data-journalism, javascript, kinetic-gain-protocol-suite, procurement, research, well-known
- Language: JavaScript
- Homepage: https://pulse.kineticgain.com/
- Size: 31.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# procurement-pulse-engine
> The crawl + aggregate engine behind the [AI Procurement Pulse](https://pulse.kineticgain.com/). Probes a universe of vendor domains for the eleven [Kinetic Gain Protocol Suite](https://github.com/mizcausevic-dev/kinetic-gain-protocol-suite) documents and produces the quarterly issue dataset.
```bash
node src/run.mjs --issue issue-1 --concurrency 8 --timeout 6000
# → data/issue-1.json (the publishable aggregate)
# → data/issue-1-raw.json (per-domain ProbeResults)
```
## Status — [v0.4.0](https://github.com/mizcausevic-dev/procurement-pulse-engine/releases/tag/v0.4.0) (2026-05-28)
- **Universe:** 1,408 unique domains across 49 verticals. Pre-Issue #5 expansion (2026-05-28) added 408 buyer-readable vendors across ten under-represented verticals: Customer Data Platform (25), eCommerce Platform (37), AI Coding & Developer AI (31), AI Agent Platform (36), AI Data Labeling (32), MarTech (63), Customer Service Platform (30), Document & eSignature (24), Mid-Market HR/Payroll (48), Vertical SaaS (82). Smoke-tested via Cloudflare DNS — 4 placeholders removed, 6 typos corrected (e.g. `weights-and-biases.com` → `wandb.com`, `mindbody.com` → `mindbodyonline.com`, `stack.com` → `constructconnect.com`). Previous footprint: 1,007 / 38 verticals; 37 at Issue #1. **Locked baseline snapshot** at [`data/baseline-2026-05-28.json`](data/baseline-2026-05-28.json) (1,007 domains, frozen — August Issue #5 measures the +408 expansion against this baseline so the original 1,007 stay comparable issue-to-issue). Snapshot: 1 publisher (KG), 0.10% rate, verifiedRate 1.0.
- **Quarterly cadence pre-registered.** [`.github/workflows/quarterly-crawl.yml`](.github/workflows/quarterly-crawl.yml) auto-fires Aug/Nov/Feb/May 15 at 14:00 UTC, against universe.csv at HEAD, with drift vs the locked `data/issue-4-v04-full.json` baseline. The August fire is Issue #5.
- **Per-spec discriminator on all 11 Suite paths.** Each spec now requires its canonical `*_version` field (`agent_card_version`, `incident_card_version`, etc.). Closes the Gatsby/SPA-catchall false positive surfaced in Issue #4 — `corporate.charter.com` dropped 82/100 → 0/100; canonical publisher `kineticgain.com` still 100/100.
- **First verified signing posture.** All 11 kineticgain.com dogfooded `/.well-known/` docs are ed25519-signed against the public key at [`kineticgain.com/.well-known/pulse-signing.json`](https://kineticgain.com/.well-known/pulse-signing.json). Engine probe reports `{ verified: 11, unsigned: 0, invalid: 0 }`.
- **Engine tests:** 15/15 pass on `main`.
> The sections below preserve the Issue #1 baseline narrative and the journey through Issues #2–#4. Issue #5 (the first true quarterly delta, August 2026) inherits the v0.4 discriminator + signing posture by default.
## What it does
1. Reads [`universe.csv`](universe.csv) — `domain,vertical,note`, the set of vendors measured this issue.
2. Probes each domain's eleven well-known paths in parallel (bounded concurrency) using the vendored [`well-known-probe`](https://github.com/mizcausevic-dev/well-known-probe-js) core.
3. Aggregates into headline rate, per-vertical rollups, per-spec adoption, and a leaderboard.
4. Writes the issue dataset that [`procurement-pulse-landing`](https://github.com/mizcausevic-dev/procurement-pulse-landing) renders.
## The Issue #1 baseline finding
The first real crawl (37 domains across 9 verticals — AI Platform, EdTech, HealthTech, FinTech, Enterprise SaaS, Data, Observability, Identity, plus the Kinetic Gain reference properties) returned:
> **0.0% publication rate. Zero domains — including kineticgain.com's own properties — publish any Suite document at `/.well-known/` yet.**
This is the honest starting line. The Suite shipped in 2025; adoption begins from zero. Issue #1 is **"The Zero Baseline"** — the instrument is calibrated, the universe is defined, and every future issue measures the climb. (Methodology note: an empty `index.json` counts as published; a `200` of the wrong shape does not. The zero is real, not a probe artifact — verified against the discriminator-bearing specs too.)
## Usage
```bash
npm run crawl -- --issue issue-1 # full universe.csv
node src/run.mjs --issue issue-1 --limit 5 # first 5 domains (smoke test)
node src/run.mjs --issue issue-2 --concurrency 12 # next issue, more parallelism
# diff two issues (aggregate deltas; add --from-raw/--to-raw for per-domain movers)
npm run drift -- --from data/issue-1.json --to data/issue-2.json \
--from-raw data/issue-1-raw.json --to-raw data/issue-2-raw.json
```
| Flag | Default | Meaning |
| --- | --- | --- |
| `--issue` | `issue-1` | Output filename stem (`data/.json`) |
| `--concurrency` | `8` | Parallel domain probes |
| `--timeout` | `6000` | Per-fetch timeout (ms) |
| `--limit` | `0` (all) | Cap the universe (for smoke tests) |
## Output shape (`data/.json`)
```jsonc
{
"issue": "issue-2",
"generatedAt": "2026-05-24T…",
"universe": { "total": 350, "verticals": 18 },
"headline": { "domainsPublishingAny": 1, "publicationRate": 0.0029, "avgScore": 0.3 },
"signatures": { "foundDocs": 11, "verifiedDocs": 0, "invalidDocs": 0, "verifiedRate": 0 },
"byVertical": { "AI Platform": { "domains": 40, "publicationRate": 0, "avgScore": 0 }, … },
"bySpec": { "aeo": { "label": "AEO Protocol", "publishers": 1, "rate": 0.0029, "verified": 0 }, … },
"leaderboard": [ { "domain": "kineticgain.com", "score": 100, "tier": "comprehensive", "published": 11 }, … ]
}
```
## Signature verification (ed25519)
Each found document is checked for an ed25519 signature using Node's built-in `crypto` (no dependencies). A signed Suite document carries a top-level `signature` block:
```jsonc
"signature": {
"algorithm": "ed25519",
"public_key": "",
"signing_key_url": "https://vendor.example/.well-known/keys/2026.json", // optional
"value": ""
}
```
The signature covers the **canonical** serialization of the document with its own `signature` block removed (recursively key-sorted JSON, array order preserved). Each document resolves to one of three states, surfaced per-document and rolled up into `signatures` / `bySpec[].verified`:
| Status | Meaning |
| --- | --- |
| `verified` | Signature present and verifies (tamper-evident). |
| `unsigned` | No `signature` block. |
| `invalid` | Signature present but fails verification, or malformed. |
By default the embedded `public_key` is used (tamper-evidence). Pass `--verify-key-fetch` semantics (`verifyKeyFetch` option) to instead trust the key fetched from `signing_key_url` (provenance). Mirrors [`hash-attestation-rs`](https://github.com/mizcausevic-dev/hash-attestation-rs).
## Drift (`src/drift.mjs`)
`driftAggregate(prev, curr)` diffs two committed issue datasets (headline / per-vertical / per-spec / signature-rate deltas, and flags `new` / `dropped` verticals). `driftDomains(prevRaw, currRaw)` diffs the per-domain raw arrays to find movers — newly publishing, stopped, score and per-spec changes. The CLI writes `data/-drift.json` and prints a human summary.
## Issue markdown (`src/summarize.mjs`)
`summarize.mjs` renders a Pulse-issue markdown body by filling token placeholders in [`docs/issues/ISSUE_TEMPLATE.md`](docs/issues/ISSUE_TEMPLATE.md) with the issue's aggregate JSON and (optional) drift JSON. Output goes to `docs/issues/.md`, ready to paste into a GitHub Issue or publish via pulse.kineticgain.com.
```bash
node src/summarize.mjs --issue issue-2026-08 --baseline issue-4-v04-full --issue-number 5
```
The quarterly-crawl workflow invokes the summarizer automatically after drift, so the scheduled August / November / February / May firings produce **JSON + ready-to-publish markdown** in a single commit. The staged preview at [`docs/issues/issue-5-staged-draft.md`](docs/issues/issue-5-staged-draft.md) shows what Issue #5 will look like — generated from the 2026-05-28 snapshot vs the locked v0.4 baseline as a pipeline dry-run.
## Crawl etiquette
GET-only requests to public, designed-to-be-fetched `/.well-known/` paths. Bounded concurrency, per-request timeout, `redirect: follow`. No authentication, no headless browser, no scraping of page content. The universe is published alongside each issue so the run is reproducible by anyone.
## Roadmap
- **Expand the universe** further toward ~1,200 domains. Issue #2 grew the lens from 37 → **350 domains across 18 verticals**; the next pass deepens toward Fortune 500 + top 100 K-12 EdTech + top 50 HealthTech AI.
- ✅ **Signature verification** — ed25519 over the canonical document hash (`src/signature.mjs`).
- ✅ **Drift tracking** — `src/drift.mjs` diffs each issue against the previous.
- **Scheduled GH Action** that runs the crawl quarterly and commits the dataset, which triggers the landing-site rebuild.
- **Sign the dogfooded docs** — sign kineticgain.com's own Suite documents so they flip from `unsigned` to `verified`.
## Composes with
| Repo | Relationship |
| --- | --- |
| [`well-known-probe-js`](https://github.com/mizcausevic-dev/well-known-probe-js) | The probe core (vendored into `src/probe.mjs`) |
| [`procurement-pulse-landing`](https://github.com/mizcausevic-dev/procurement-pulse-landing) | Renders this engine's dataset at pulse.kineticgain.com |
| [`kinetic-gain-protocol-suite`](https://github.com/mizcausevic-dev/kinetic-gain-protocol-suite) | The eleven specs measured |
| [`aeo-crawler`](https://github.com/mizcausevic-dev/aeo-crawler) | Heavyweight crawler; this is the Pulse-specific lightweight aggregator |
## License
MIT.