{"id":50708500,"url":"https://github.com/fraware/pcs-bench","last_synced_at":"2026-06-09T13:31:54.910Z","repository":{"id":359104355,"uuid":"1243634076","full_name":"fraware/pcs-bench","owner":"fraware","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-20T13:00:21.000Z","size":189,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-20T15:44:48.179Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fraware.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-19T14:17:35.000Z","updated_at":"2026-05-20T13:01:11.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/fraware/pcs-bench","commit_stats":null,"previous_names":["fraware/pcs-bench"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/fraware/pcs-bench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fraware%2Fpcs-bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fraware%2Fpcs-bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fraware%2Fpcs-bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fraware%2Fpcs-bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fraware","download_url":"https://codeload.github.com/fraware/pcs-bench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fraware%2Fpcs-bench/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34110011,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-09T13:31:54.833Z","updated_at":"2026-06-09T13:31:54.900Z","avatar_url":"https://github.com/fraware.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cpre\u003e\n###############################################################################################\n#                    ____   ____ ____        ____                  _                          #\n#                   |  _ \\ / ___/ ___|      | __ )  ___ _ __   ___| |__                       #\n#                   | |_) | |   \\___ \\ _____|  _ \\ / _ \\ '_ \\ / __| '_ \\                      #\n#                   |  __/| |___ ___) |_____| |_) |  __/ | | | (__| | | |                     #\n#                   |_|    \\____|____/      |____/ \\___|_| |_|\\___|_| |_|                     #\n#                                                                                             #\n###############################################################################################\n\u003c/pre\u003e\n\n[![Python 3.11+](https://img.shields.io/badge/python-3.11%20|%203.12-3776ab?style=flat-square\u0026logo=python\u0026logoColor=white)](https://www.python.org/downloads/)\n[![License](https://img.shields.io/badge/license-Apache--2.0-green?style=flat-square)](LICENSE)\n[![PCS](https://img.shields.io/badge/protocol-Proof--Carrying%20Science-5c4ee5?style=flat-square)](https://github.com/SentinelOps-CI/pcs-core)\n\n**Independent benchmarks for Proof-Carrying Science releases.**\n\npcs-bench runs realistic good-and-bad release scenarios, scores the results, and packages everything reviewers need while certificates, admission, and rendering stay in the ecosystem projects where they belong. Protocol rules live in [pcs-core](https://github.com/SentinelOps-CI/pcs-core), and this repository provides the evaluation harness that ties the ecosystem together.\n\n\u003c/div\u003e\n\n---\n\n## Why pcs-bench exists\n\nScientific releases should be **reproducible**, **auditable**, and **easy to explain**. pcs-bench answers a focused question about evidence.\n\n\u003e When we ship a PCS release, can we show with evidence that valid releases pass, invalid ones fail for the right reasons, and the story holds across tools?\n\nThe harness calls each project’s public command-line tools, records what happened, and compares outcomes to clear expectations so teams can defend a release with data instead of anecdote.\n\n---\n\n## What you get\n\n| Output | Who uses it |\n|--------|-------------|\n| **Benchmark report** — machine-readable JSON aligned with pcs-core schemas | CI, release automation, regression tracking |\n| **Human report** — Markdown or HTML summaries with scores and failures | Engineers and release owners |\n| **Reviewer packet** — cases, fixtures, report, and reproduction checks in one folder | External review, audits, publications |\n| **Readiness summary** — pass/fail checklist before a version tag | Release managers |\n\nEight benchmark dimensions cover reproducibility, failure localization, certificates, registry coverage, formal checks, scientific-memory rendering, repair hints, and cross-domain portability. Full metric definitions live in [Metrics](docs/metrics.md).\n\n---\n\n## Try it in five minutes\n\n**Requirements:** Python 3.11 or 3.12.\n\n```bash\ngit clone https://github.com/fraware/pcs-bench.git\ncd pcs-bench\npip install -e \".[dev]\"\npython scripts/materialize_fixtures.py\nmake gate\n```\n\nThat command runs the full offline pipeline, including fixtures, case checks, simulated benchmarks, report validation, and a reviewer packet, and it only requires this repository on disk.\n\nOn Windows, use `.\\make.ps1 gate` instead of `make gate`. When the console script is absent from PATH, invoke `python -m pcs_bench` with the same arguments.\n\nGenerate a default config and run a single suite.\n\n```bash\npcs-bench init\npcs-bench list-suites\npcs-bench run --suite labtrust-qc-release --out reports/try.json\n```\n\n---\n\n## How it works\n\n```mermaid\nflowchart LR\n  subgraph harness [pcs-bench]\n    Cases[Benchmark cases]\n    Run[Run and score]\n    Report[Report and packet]\n  end\n  subgraph ecosystem [Ecosystem repos - public CLIs only]\n    Core[pcs-core]\n    LT[LabTrust-Gym]\n    CE[CertifyEdge]\n    PF[provability-fabric]\n    SM[scientific-memory]\n  end\n  Cases --\u003e Run\n  Run --\u003e Core\n  Run --\u003e LT\n  Run --\u003e CE\n  Run --\u003e PF\n  Run --\u003e SM\n  Run --\u003e Report\n```\n\n**Cases** describe a release scenario such as a valid chain, a tampered hash, or a missing artifact. **Run** executes the scenario against real tools in live mode or against checked-in expected outcomes in offline simulation. **Report** aggregates scores, exports a standard JSON report, and can build a packet for reviewers.\n\nEvery interaction stays on public command-line entry points, and the harness records a full audit trail for each step.\n\n---\n\n## Two ways to run\n\n| Goal | What to run | What you need |\n|------|-------------|---------------|\n| **Develop or open a PR** | `make release-prep` | This repo only |\n| **Cut a real PCS release** | `make live-ci` then `make release-verify` | Clones of pcs-core and the four producer repos (see below) |\n\n**Offline simulation** is the default mode because it is fast, deterministic, and well suited to contributors and continuous integration.\n\n**Live evaluation** invokes real binaries from LabTrust-Gym, CertifyEdge, provability-fabric, and scientific-memory, and release managers rely on that path for release-grade evidence before a tag.\n\nThe [Release guide](docs/release.md) walks through tagging, and [Running benchmarks](docs/execution.md) lists day-to-day commands.\n\n---\n\n## Benchmark suites\n\n| Suite | What it exercises |\n|-------|-------------------|\n| LabTrust QC release | End-to-end QC release chain and where failures are caught |\n| Tool use safety | Certificates, policy, and unauthorized tool calls |\n| Computation reproducibility | Witnesses, environment digests, and result integrity |\n| Formal trust kernel | Lean obligations and theorem checks |\n| Scientific memory rendering | Whether evidence renders with the required sections |\n| Cross-domain | Shared PCS protocol behavior across workflows |\n\n```bash\npcs-bench run --suite all --simulate --out reports/full.json\n```\n\nSuite layout and methodology appear in [docs/benchmark-methodology.md](docs/benchmark-methodology.md).\n\n---\n\n## Ecosystem\n\npcs-bench sits beside the projects it measures and coordinates them through documented interfaces.\n\n| Project | Role in evaluation |\n|---------|-------------------|\n| [pcs-core](https://github.com/SentinelOps-CI/pcs-core) | Schemas, registry, conformance |\n| [LabTrust-Gym](https://github.com/fraware/LabTrust-Gym) | Reference workflow and release scenarios |\n| [CertifyEdge](https://github.com/fraware/CertifyEdge) | Certificates and witnesses |\n| [provability-fabric](https://github.com/SentinelOps-CI/provability-fabric) | Admission and verification |\n| [scientific-memory](https://github.com/fraware/scientific-memory) | Evidence import and rendering |\n| **pcs-bench** (here) | Orchestration, scoring, packets, release checks |\n\nProducer repositories publish a standard ingest file (`pcs_bench_ingest.v0.json`) that pcs-bench merges into a single ecosystem-wide report. The [Producer integration](docs/producers.md) guide explains that contract.\n\n---\n\n## Contribute\n\nWe welcome issues, documentation improvements, new benchmark cases, and CI hardening, and you can make meaningful contributions with only this repository cloned locally.\n\n**Good first steps**\n\n1. Run `make release-prep` and confirm it passes.\n2. Read [CONTRIBUTING.md](CONTRIBUTING.md) and [Adding a benchmark suite](docs/adding-a-benchmark-suite.md).\n3. Pick a suite under `benchmarks/`, add or refine a case with `benchmark_case.v0.json` and `expected/verification_result.json`.\n4. Open a pull request so offline CI runs automatically.\n\n**Ways to help**\n\n- Add invalid-release scenarios that reflect real failure modes in the field\n- Improve reports and packet layout for reviewers\n- Extend live-adapter coverage when a producer CLI stabilizes\n- Clarify documentation wherever readers might stall\n\nOpen a [GitHub issue](https://github.com/fraware/pcs-bench/issues) for questions and design discussion.\n\n---\n\n## Documentation\n\n| Topic | Link |\n|-------|------|\n| Full doc index | [docs/README.md](docs/README.md) |\n| Configuration (`pcs-bench.yaml`) | [docs/configuration.md](docs/configuration.md) |\n| All CLI commands | `pcs-bench --help` |\n\n---\n\n## License\n\nApache-2.0. See [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffraware%2Fpcs-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffraware%2Fpcs-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffraware%2Fpcs-bench/lists"}