{"id":49191503,"url":"https://github.com/cppalliance/wg21-paperlint","last_synced_at":"2026-04-23T07:01:40.577Z","repository":{"id":353225464,"uuid":"1206563247","full_name":"cppalliance/wg21-paperlint","owner":"cppalliance","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-23T01:15:01.000Z","size":2921,"stargazers_count":0,"open_issues_count":11,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-23T02:22:24.839Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cppalliance.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE_1_0.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-10T03:18:55.000Z","updated_at":"2026-04-23T01:14:42.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/cppalliance/wg21-paperlint","commit_stats":null,"previous_names":["cppalliance/wg21-paperlint"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/cppalliance/wg21-paperlint","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cppalliance%2Fwg21-paperlint","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cppalliance%2Fwg21-paperlint/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cppalliance%2Fwg21-paperlint/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cppalliance%2Fwg21-paperlint/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cppalliance","download_url":"https://codeload.github.com/cppalliance/wg21-paperlint/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cppalliance%2Fwg21-paperlint/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32169657,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-23T02:19:40.750Z","status":"ssl_error","status_checked_at":"2026-04-23T02:17:55.737Z","response_time":53,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-23T07:01:39.111Z","updated_at":"2026-04-23T07:01:40.558Z","avatar_url":"https://github.com/cppalliance.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Paperlint\n\nPaperlint finds mechanically verifiable defects in WG21 C++ standards papers — the kind of things an author would want to fix before the committee sees their work. Misspelled identifiers, broken cross-references, code samples that don't match their prose descriptions, wording that contradicts itself.\n\nIt is a linter, not a critic. It does not evaluate whether a proposal is good, whether a design is sound, or whether a paper should advance. It points at things. The committee decides the rest.\n\n## How it works\n\nPaperlint reads a paper, searches for defects against a rubric of 30 failure modes, then filters every candidate finding through a verification gate that rejects anything that might be intentional. What survives is a short list of items the author probably wants to know about.\n\nThe pipeline has four stages:\n\n1. **Discovery** — reads the paper end-to-end, finds every potential defect, outputs structured findings with exact evidence quotes. By default this runs **three** LLM passes: the first pass is a full scan; each later pass is shown the findings already collected and asked to add only *additional* defects (programmatic dedup merges overlaps). Use `--discovery-passes N` on `eval` and `run` to change the count (minimum 1).\n2. **Quote Verification** — programmatic check that every quoted passage actually exists in the source document. Findings with unverifiable evidence are dropped before reaching the gate.\n3. **Gate** — challenges each finding, searching for reasons the author wrote it that way on purpose. Rejects aggressively. A false positive damages the credibility of every true positive around it.\n4. **Evaluation** — assembles the surviving findings into a per-paper evaluation\n\nEach stage is driven by a prompt in the `prompts/` directory. The prompts are the product. Everything else is plumbing.\n\nFor a detailed description of the pipeline architecture, models, and output schema, see [docs/design.md](paperlint/docs/design.md).\n\n## Installation\n\nPython 3.12 or newer is required. Paperlint bundles its PDF/HTML-to-markdown converter (`tomd`) as a sibling package in this repository; install it as an editable dependency.\n\n```bash\ngit clone https://github.com/cppalliance/paperlint.git\ncd paperlint\npip install -e ./tomd\npip install -e .\nexport OPENROUTER_API_KEY=sk-or-...   # required for eval / run (LLM stages)\n```\n\n## Quick start (third parties, wg21.org-style output)\n\nSet a workspace directory once; all paths below are under it.\n\n```bash\nexport OPENROUTER_API_KEY=sk-or-...\nexport WS=./data\nexport M=2026-02\n```\n\n**Pipeline order:** *convert* (download source → `paper.md` + `meta.json`, no LLM) → *eval* or *run* (LLM: discovery → … → `evaluation.json`). You do **not** need a separate `mailing` subcommand for normal use: `convert` / `eval` / `run` refresh the open-std [mailing index](paperlint/mailing.py) and write `mailings/\u003cM\u003e.json` as they start.\n\n**What is downloaded?**\n\n- The mailing **index** page is fetched once per command (HTML table of papers). That is *not* every PDF in the month.\n- Only papers you **convert** are downloaded from their canonical URL (and cached under `.paperlint_cache/` in the CWD for `convert`).\n- You never need to convert the whole mailing to evaluate one or a few papers.\n\n**Outputs to drive your own UI** (same shapes sites like wg21.org can ingest): for each paper, `evaluation.json` (findings, references) plus `paper.md` (citations use char offsets in the JSON). After a full `run`, the workspace also has `index.json` for batch summaries.\n\n### A. One paper (minimal)\n\n```bash\npython -m paperlint convert $M --workspace-dir \"$WS\" --papers P3642R4\n# or: --paper P3642R4\npython -m paperlint eval $M/P3642R4 --workspace-dir \"$WS\"\n```\n\n### B. Several papers\n\n```bash\npython -m paperlint convert $M --workspace-dir \"$WS\" --papers P3642R4,N5000R0\npython -m paperlint run $M --workspace-dir \"$WS\" --papers P3642R4,N5000R0\n# Or run one eval per paper: eval $M/P3642R4, eval $M/N5000R0, etc.\n```\n\n### C. Entire monthly mailing\n\n```bash\npython -m paperlint convert $M --workspace-dir \"$WS\" --max-cap 0 --max-workers 10\npython -m paperlint run $M --workspace-dir \"$WS\" --max-cap 0 --max-workers 10\n```\n\nOptionally add `--max-cap N` to limit *how many* papers to process, after any `--papers` filter. Use `--max-workers` to parallelize (threads inside `convert` and `run`).\n\n## Usage\n\nPaperlint treats the open-std.org mailing index as authoritative for paper metadata (title, authors, audience, paper_type, canonical URL). Every invocation names the mailing explicitly (except when using only `mailing` for index-only work).\n\n`--workspace-dir` is the **workspace root**: the same directory is used for input and output — mailing index (`mailings/\u003cmailing-id\u003e.json`), per-paper trees (`paper.md`, `evaluation.json`, …), and `index.json` after a full `run`. The legacy alias `--output-dir` is accepted and means the same path.\n\nCommands in logical order:\n\n1. **`mailing`** (optional) — only writes `mailings/\u003cid\u003e.json` from open-std; no downloads of paper sources. Use when you want the index on disk before anything else.\n2. **`convert`** — for each paper selected (entire list, `--papers` subset, or `--max-cap` slice), fetch and convert to `paper.md` + `meta.json`. **No** LLM, no `OPENROUTER_API_KEY` required.\n3. **`eval`** (single paper) or **`run`** (batch) — load existing `paper.md` / `meta.json` and run the LLM pipeline. **Requires** prior `convert` for those papers (or you get a clear error to run `convert` first).\n\nFetch and persist a mailing index only (optional):\n\n```bash\npython -m paperlint mailing 2026-02 --workspace-dir ./data/\n```\n\nConvert to markdown (no AI). Examples:\n\n```bash\n# Full mailing (or use --max-cap)\npython -m paperlint convert 2026-02 --workspace-dir ./data/ --max-cap 50 --max-workers 10\n# One or a few paper ids (comma-separated) — does not download/convert the rest\npython -m paperlint convert 2026-02 --workspace-dir ./data/ --papers P3642R4,N5000R0\npython -m paperlint convert 2026-02 --workspace-dir ./data/ --paper P3642R4\n```\n\nLLM evaluation (after `convert` for the same paper(s)):\n\n```bash\npython -m paperlint eval 2026-02/P3642R4 --workspace-dir ./data/\npython -m paperlint eval 2026-02/P3642R4 --workspace-dir ./data/ --discovery-passes 5\n```\n\n```bash\npython -m paperlint run 2026-02 --workspace-dir ./data/ --max-cap 50 --max-workers 10\npython -m paperlint run 2026-02 --workspace-dir ./data/ --papers A,B --discovery-passes 1\n```\n\nBare paper-ids (`eval P3642R4`) and local file paths (`eval ./paper.pdf`) are not accepted — the caller must use `\u003cmailing-id\u003e/\u003cpaper-id\u003e`.\n\n### Output\n\nEach paper produces a directory with the following files:\n\n```\n{paper_id}/\n  evaluation.json   # findings, references with char offsets, metadata\n  paper.md          # markdown conversion of the source paper, with YAML front matter\n  meta.json         # PaperMeta record (title, authors, audience, paper_type, ...)\n```\n\nThe `extracted_char_start` and `extracted_char_end` fields in each reference select the exact evidence text in `paper.md`. This pairing is the contract for front-end citation rendering.\n\n`paper.md` is also written by the standalone `convert` command so consumers that only need markdown ingestion can skip the AI pipeline.\n\nFor batch runs, an `index.json` summarizes the mailing with per-committee paper lists and finding counts. `mailings/\u003cmailing-id\u003e.json` persists the ground-truth paper index scraped from open-std.org, including the original table cells verbatim under `raw_columns`/`raw_links` so downstream consumers can read columns paperlint does not interpret.\n\n### Storage\n\nAll on-disk writes go through `paperlint.storage.StorageBackend`; the default `JsonBackend` writes the layout above. The interface is designed so a database-backed implementation can be added without touching call sites — see [paperlint/storage.py](paperlint/storage.py).\n\n## Environment\n\nPaperlint requires one API key:\n\n```bash\nexport OPENROUTER_API_KEY=sk-or-...\n```\n\nOr create a `.env` file in the working directory. See `.env.example`.\n\n### Failure details and optional logging\n\nIf you run `eval` / `run` **before** `paperlint convert` for that paper, the CLI\nexits with an error (missing `paper.md` / `meta.json`) and does not write\n`evaluation.json` for that case.\n\nWhen the **analysis** run fails, `evaluation.json` may include additive fields\n`failure_stage` (typically `analysis` in that path), `failure_type`, and\n`failure_message` with the exception text. Set `PAPERLINT_ERROR_TRACEBACK=1` to also\nembed a `failure_traceback` string in the JSON (off by default so production\noutputs stay small).\n\nOptionally log failures to a file: set `PAPERLINT_LOG_FILE` to a path, or set\n`PAPERLINT_LOG_TO_WORKSPACE=1` to append to `\u003cworkspace-dir\u003e/paperlint.log` (the\n`--workspace-dir` root must be set). Error lines are also written to **stderr** so\nhost tools that capture subprocess output can see them.\n\n## What this is\n\nA tool that reads papers and finds the kinds of errors that are easy to make and easy to miss. The same way `clang-tidy` finds a missing `const` without judging your architecture, paperlint finds a misspelled identifier without judging your proposal.\n\nThe findings are objective and mechanically verifiable. If two experts could reasonably disagree about whether something is a defect, it is not reported. The rubric defines what counts. The gate enforces it.\n\n## What this is not\n\nPaperlint does not speak for WG21. It is not an official tool of the committee, and its evaluations do not represent the views of any working group, study group, or individual committee member.\n\nIt does not evaluate the quality, importance, or likelihood of success of any proposal. It does not recommend for or against adoption. It does not assess design choices, alternatives, or trade-offs.\n\nIt uses AI (Claude, via the OpenRouter API) to perform the analysis. The AI reads the paper, applies the rubric, and produces structured findings. The prompts that drive the analysis are in this repository and are open for inspection.\n\n## Repository structure\n\n```\npaperlint/\n  __init__.py\n  __main__.py          # CLI entry point (mailing / convert / eval / run)\n  orchestrator.py      # Top-level pipeline coordination\n  pipeline.py          # Discovery / verify / gate / summary steps\n  llm.py               # OpenRouter client + retry/parsing helpers\n  models.py            # Dataclasses (Evidence, Finding, GatedFinding, PaperMeta)\n  extract.py           # tomd-backed paper-to-markdown wrapper + metadata fallback\n  mailing.py           # WG21 open-std.org mailing page scraper\n  storage.py           # StorageBackend ABC + JsonBackend\n  credentials.py       # API key validation\n  rubric.md            # 30 failure modes across 4 axes\n  prompts/\n    1-discovery.md     # \"Find every defect\"\n    2-verification-gate.md  # \"Reject everything that isn't a real defect\"\n    3-evaluation-writer.md  # \"State what was found\"\n  docs/\n    design.md          # Pipeline architecture and output schema\ntomd/                  # Bundled PDF/HTML to markdown converter\n```\n\n## Tests\n\nFrom the repository root, install the bundled converter, then paperlint with test extras (pulls in `pytest`, `mistune`, `pymupdf` for import-time dependencies):\n\n```bash\npip install -e ./tomd\npip install -e \".[test]\"\npytest tests/\n```\n\nPytest is configured so the repo root is on `PYTHONPATH`, which lets `import tomd` resolve the vendored `tomd/` tree even before `pip install -e ./tomd`. Paperlint’s extract tests live in `tests/test_paperlint_extract.py` so a combined `pytest tests/ tomd/tests/` run does not collide with `tomd/tests/test_extract.py` on the module name `test_extract`. Running `tomd`’s own tests (`pytest tomd/tests/`) still requires `pip install -e ./tomd` (or the step above) so `mistune` and other `tomd` dependencies are present.\n\n## License\n\nCopyright (c) 2026 Sergio DuBois (sentientsergio@gmail.com)\n\nDistributed under the Boost Software License, Version 1.0.\nSee [LICENSE_1_0.txt](LICENSE_1_0.txt) or http://www.boost.org/LICENSE_1_0.txt\n\nOfficial repository: https://github.com/cppalliance/paperlint\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcppalliance%2Fwg21-paperlint","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcppalliance%2Fwg21-paperlint","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcppalliance%2Fwg21-paperlint/lists"}