https://github.com/kadubon/bottleneck-audit-toolkit
Offline, fail-closed verifier for JSONL telemetry event logs. Emits deterministic audit certificates + human summaries with explicit claims/non-claims for bottleneck and integrity review.
https://github.com/kadubon/bottleneck-audit-toolkit
ai audit bottleneck checkpointing distributed-training event-logs jsonl mlops offline-verification performance-monitoring silent-data-corruption tail-latency telemetry
Last synced: 5 months ago
JSON representation
Offline, fail-closed verifier for JSONL telemetry event logs. Emits deterministic audit certificates + human summaries with explicit claims/non-claims for bottleneck and integrity review.
- Host: GitHub
- URL: https://github.com/kadubon/bottleneck-audit-toolkit
- Owner: kadubon
- License: apache-2.0
- Created: 2025-12-26T05:32:21.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-12-26T08:33:00.000Z (5 months ago)
- Last Synced: 2025-12-27T20:26:26.010Z (5 months ago)
- Topics: ai, audit, bottleneck, checkpointing, distributed-training, event-logs, jsonl, mlops, offline-verification, performance-monitoring, silent-data-corruption, tail-latency, telemetry
- Language: Python
- Homepage: https://kadubon.github.io/github.io/
- Size: 48.8 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
- Notice: NOTICE
Awesome Lists containing this project
README
# Bottleneck Audit Toolkit (BATool)
## DISCLAIMER / NO SUPPORT / NO WARRANTY / NO LIABILITY
BATool is a research and audit tool. It is **not** a security boundary, safety guarantee, or compliance system.
Use is entirely at your own risk. The authors and contributors accept **no warranty, no liability, and no support obligations**.
Issues, PRs, and inquiries may be ignored or left unanswered.
BATool is an **offline, fail-closed verifier** for **JSONL event logs**. It emits:
- a deterministic, machine-readable **certificate** (JSON), and
- a deterministic, human-readable **summary** (text),
while separating **claims** (supported by the observed log) from **non-claims** (what cannot be supported, and why).
## License
- Code and tooling: **Apache-2.0** (see `LICENSE` and `NOTICE`)
- TeX sources under `paper/`: **CC-BY-4.0** (see `paper/LICENSE` and file headers)
## Quickstart (uv)
```bash
uv venv
uv pip install -e ./verifier
batool verify --input ./examples/ok_run/events.jsonl --out /tmp/certificate.json --human /tmp/summary.txt
```
## Quickstart (pip)
```bash
python -m venv .venv
. .venv/bin/activate
pip install -e ./verifier
batool verify --input ./examples/ok_run/events.jsonl --out /tmp/certificate.json --human /tmp/summary.txt
```
## Verdicts
- `VERIFIED`: no reject/undecidable triggers, and at least one claim is made.
- `UNDECIDABLE`: missing data or ambiguity (for example missing END events or clock skew beyond tolerance).
- `REJECT`: schema violations, inconsistent run_id, duplicate event_id, tamper mismatch, or strict-mode failures.
Exit codes: `0` VERIFIED, `1` UNDECIDABLE, `2` REJECT.
## Claims vs non-claims
- Claims are explicitly supported by the observed data under this tool's validation rules.
- Non-claims explain what could not be supported, and why.
This tool does **not** infer causality. It performs telemetry-visible checks and heuristic aggregation only.
It prefers `UNDECIDABLE` / `REJECT` over guessing.
## Strict mode
Use `--strict` to treat any clock decrease or missing END as `REJECT`. This is more conservative and intended for high-integrity pipelines.
## Sample output (summary excerpt)
```
BATool Verification Summary
Verdict: VERIFIED
Run ID: run-ok-001
Input Digest (sha256, KEYSORT_UTF8): 7f2b...
Claims:
- useful_compute_floor: 5 steps
- dominant_time_component: ALLREDUCE_DOMINANT (evidence=high)
- integrity: OK (contract_ok=true)
Non-claims:
- none
```
## Reference theory (DOIs)
BATool implements a **reduced log-checking protocol** and a **minimal certificate format**. It does **not** implement the full
statistical guarantees or cryptographic integrity models described in the following references.
- TLUC:
- SDC:
- MCCBE:
Not implemented (examples):
- confidence sequences or coverage-in-time proofs
- cryptographic authenticity or secure log signatures
- power/cooling estimators or hardware trust anchors
## Documentation
See `docs/` for:
- certificate format (`docs/CERTIFICATE_FORMAT.md`)
- threat model (`docs/THREAT_MODEL.md`)
- versioning (`docs/VERSIONING.md`)
- reproducibility steps (`docs/REPRODUCIBILITY.md`)