{"id":50749535,"url":"https://github.com/musharna/ghostcite","last_synced_at":"2026-06-11T00:02:25.414Z","repository":{"id":363400727,"uuid":"1262600526","full_name":"musharna/ghostcite","owner":"musharna","description":"Deterministic, no-LLM CLI that catches ghost citations — when the author/year you cited doesn't match the DOI's CrossRef record (+ retraction flags).","archived":false,"fork":false,"pushed_at":"2026-06-08T18:03:42.000Z","size":366,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-08T20:06:14.755Z","etag":null,"topics":["bibtex","citations","cli","crossref","doi","research-integrity"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/ghostcite/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/musharna.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-08T06:25:01.000Z","updated_at":"2026-06-08T18:03:45.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/musharna/ghostcite","commit_stats":null,"previous_names":["musharna/ghostcite"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/musharna/ghostcite","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/musharna%2Fghostcite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/musharna%2Fghostcite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/musharna%2Fghostcite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/musharna%2Fghostcite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/musharna","download_url":"https://codeload.github.com/musharna/ghostcite/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/musharna%2Fghostcite/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34175887,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bibtex","citations","cli","crossref","doi","research-integrity"],"created_at":"2026-06-11T00:02:24.508Z","updated_at":"2026-06-11T00:02:25.404Z","avatar_url":"https://github.com/musharna.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ghostcite\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"examples/assets/logo.png\" alt=\"ghostcite\" width=\"380\"\u003e\u003c/p\u003e\n\n[![PyPI](https://img.shields.io/pypi/v/ghostcite.svg)](https://pypi.org/project/ghostcite/)\n[![CI](https://github.com/musharna/ghostcite/actions/workflows/ci.yml/badge.svg)](https://github.com/musharna/ghostcite/actions/workflows/ci.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n[![Python 3.9+](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org/downloads/)\n\n**Catch ghost citations — right DOI, wrong author.**\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"examples/assets/demo.gif\" alt=\"ghostcite catching a ghost citation\" width=\"800\"\u003e\u003c/p\u003e\n\n`ghostcite` is a deterministic, **no-LLM** command-line tool that cross-checks a\nbibliography's _claimed_ author and year against CrossRef's canonical record for\neach DOI. It catches the dominant ghost-citation failure mode — a reference whose\ncited authorship doesn't match the paper the DOI actually points to — and flags\nretracted or expression-of-concern works along the way.\n\n## The problem\n\nLLM-assisted writing (and plain copy-paste drift) routinely produces references\nthat _look_ right but attribute the cited DOI to the wrong authors or year. A\nmanuscript cites \"Li et al. 2024,\" but DOI `10.3390/plants13060869` is actually\n**Chen et al.** A reviewer catches it; an automated check catches it first.\n\n\u003e Does the metadata you wrote for this citation match what CrossRef says the DOI actually is?\n\nNo model, no API key, no download — just CrossRef's REST API and a comparison.\n\n## Install\n\n```bash\npip install ghostcite          # into the current environment\npipx install ghostcite         # isolated CLI install (recommended)\nuv tool install ghostcite      # if you use uv\n```\n\n## Usage\n\n```bash\nghostcite refs.bib                         # check a BibTeX file (or .md / DOI list)\nghostcite refs.bib --cross-check pubmed    # corroborate against PubMed\nghostcite refs.bib --json                  # machine-readable output (for CI)\nghostcite refs.bib --fail-on author,year,retraction   # tune the CI gate\ncat refs.bib | ghostcite -                 # read from stdin\n```\n\nInput format is auto-detected (BibTeX, Markdown reference list, or bare DOI list);\noverride with `--format {auto,bibtex,markdown,doi}`.\n\n**Real example** — `refs.bib` cites \"Li (2024)\" for a DOI CrossRef says is Chen:\n\n```text\n$ ghostcite refs.bib\nghostcite: 1 entries, 1 with DOIs\n  ✗ A  L1  Li (2024)  →  DOI resolves to Chen (2024) — possibly wrong DOI  [10.3390/plants13060869]\n  1 A\n$ echo $?\n1\n```\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eAll flags \u0026amp; the anatomy of a finding\u003c/b\u003e\u003c/summary\u003e\n\n```text\n  ✗ A   L1    Li (2024)        →  DOI resolves to Chen (2024)…   [10.3390/plants13060869]\n  │ │   │     │                    │                               │\n  │ │   │     │                    │                               └─ DOI that was checked\n  │ │   │     │                    └─ what CrossRef actually records\n  │ │   │     └─ what you cited (claimed first author + year)\n  │ │   └─ source line in your bibliography\n  │ └─ tier: A author · B year · C cosmetic · R retraction · U unresolvable\n  └─ glyph: ✗ fails CI · ⚠ retraction · · informational\n```\n\n- **`--cross-check pubmed`** — adds PubMed/NCBI as a _second source of truth_.\n  When PubMed backs CrossRef a finding is annotated `↳ corroborated by PubMed`;\n  when PubMed instead agrees with what you _cited_, it's flagged as a CrossRef↔PubMed\n  conflict (the tier is kept so you don't silently trust either source). PubMed can\n  also _raise_ a finding CrossRef missed, or supply a record for a DOI absent from\n  CrossRef. Optional `--ncbi-email` / `--ncbi-api-key` (or `NCBI_EMAIL` /\n  `NCBI_API_KEY`) follow NCBI E-utilities etiquette and unlock a higher rate limit;\n  neither is required.\n- **`--max-rps \u003cn\u003e`** — cap outbound requests per second. ghostcite already\n  self-throttles to CrossRef's advertised rate limit (read from the response\n  headers); `--max-rps` lets you be _more_ conservative (the stricter of the two wins).\n- **`--color {auto,always,never}`** — colorize the tier glyphs. `auto` (default)\n  colorizes only on a TTY. [`NO_COLOR`](https://no-color.org/) is honored and wins\n  even over `always`. `--json` output is never colorized.\n- **stdin (`-`)** — pass `-` as the filename to read from stdin, e.g.\n  `cat refs.bib | ghostcite -` or `ghostcite - --format doi \u003c dois.txt`.\n- **`--dry-run`** — parse + classify + count only, no network.\n\nSee [`examples/`](examples/) for ready-to-run sample inputs and captured output.\n\n\u003c/details\u003e\n\n## How it works\n\n```mermaid\nflowchart TD\n    A[\"Citation: claimed author + year (+ DOI)\"] --\u003e B{\"Has DOI?\"}\n    B -- yes --\u003e C[\"GET CrossRef /works/{DOI}\"]\n    B -- no --\u003e D[\"CrossRef bibliographic search\u003cbr/\u003e(low-confidence)\"]\n    C --\u003e E{\"DOI resolves?\"}\n    E -- no --\u003e U[\"Tier U — unresolvable\"]\n    E -- yes --\u003e F[\"Compare claimed vs. canonical record\"]\n    D --\u003e F\n    F --\u003e G{\"First-author surname matches?\"}\n    G -- no --\u003e TA[\"Tier A — author mismatch\"]\n    G -- yes --\u003e H{\"Year matches?\"}\n    H -- no --\u003e TB[\"Tier B — year mismatch\"]\n    H -- yes --\u003e OK[\"OK\"]\n    C --\u003e R{\"Retracted / expression of concern?\"}\n    R -- yes --\u003e TR[\"Tier R — retraction (orthogonal)\"]\n    F -. \"--cross-check pubmed\" .-\u003e P[\"PubMed second opinion\"]\n```\n\nNo language model is involved at any step. ghostcite resolves each DOI at CrossRef\n(and optionally PubMed), then does a pure, deterministic comparison of the claimed\nfirst-author surname (Unicode-folded, punctuation-stripped) and year against the\ncanonical record, plus a retraction / expression-of-concern check. Only the HTTP\nclient touches the network, via CrossRef's polite pool (a descriptive `User-Agent`\nwith the project URL, never a personal email).\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eSeverity tiers, input formats \u0026amp; exit codes\u003c/b\u003e\u003c/summary\u003e\n\n| Tier   | Meaning                                                               | Fails CI?                       |\n| ------ | --------------------------------------------------------------------- | ------------------------------- |\n| **A**  | author-mismatch — claimed first author isn't in CrossRef's authors    | Yes                             |\n| **B**  | year-mismatch — author matches, claimed year differs                  | Yes                             |\n| **C**  | cosmetic — matches only after diacritic/initials fold (Bürger≈Burger) | No (info)                       |\n| **R**  | retraction / expression-of-concern per CrossRef                       | Yes (fires regardless of A/B/C) |\n| **U**  | unresolvable — DOI 404s, or no-DOI entry search was inconclusive      | No (warn)                       |\n| **OK** | first author + year match                                             | —                               |\n\nWhen the claimed title also diverges strongly from CrossRef's title, a Tier A\nfinding is annotated **\"possibly wrong DOI entirely\"** to distinguish a wrong-author\ncitation from a wrong-DOI one.\n\n| Format       | Detection                                       | Yields claimed author/year?            |\n| ------------ | ----------------------------------------------- | -------------------------------------- |\n| **BibTeX**   | `@article{…}` / `@…{…}` entries                 | Yes (`author`, `year`, `doi`, `title`) |\n| **Markdown** | bullet refs `- **AuthorList (YYYY).** … 10.x …` | Yes                                    |\n| **DOI list** | newline-delimited bare DOIs / `doi:` / DOI URLs | No — lookup + retraction sweep only    |\n\n| Exit code | Meaning                                            |\n| --------- | -------------------------------------------------- |\n| `0`       | clean — no findings at or above the fail threshold |\n| `1`       | findings present at/above the threshold            |\n| `2`       | tool error (network down, unparseable input, …)    |\n\n`--fail-on` (default `author,year,retraction`) selects which tiers force exit `1`;\n`--fail-on none` runs as a passive reporter. Tiers `C` and `U` never force exit `1`.\n\n\u003c/details\u003e\n\n## Use it in CI\n\nA clean run is quiet and exits `0`:\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"examples/assets/demo-clean.png\" alt=\"ghostcite clean run\" width=\"520\"\u003e\u003c/p\u003e\n\nDrop in the composite **GitHub Action**:\n\n```yaml\n- uses: musharna/ghostcite@v1\n  with:\n    paths: paper/refs.bib\n    fail-on: \"author,year,retraction\"\n```\n\n…or the **[pre-commit](https://pre-commit.com/) hook**:\n\n```yaml\nrepos:\n  - repo: https://github.com/musharna/ghostcite\n    rev: v0.1.0\n    hooks:\n      - id: ghostcite\n        args: [paper/references.bib, --fail-on, \"author,year,retraction\"]\n```\n\nEither way, a finding at or above the `--fail-on` threshold returns a non-zero\nexit, blocking the merge or commit before submission.\n\n## Scope \u0026amp; limitations\n\n`ghostcite` checks **metadata correctness** (does the DOI's record match what you\nwrote), not claim support (does the source actually _say_ what your prose claims —\na separate, LLM-based concern). It does no auto-fixing and no citation-style\nlinting. CrossRef is the source of truth; `--cross-check pubmed` adds PubMed as an\noptional second opinion.\n\n- CrossRef stores particle surnames inconsistently (`van der Berg` vs `Berg`), so a\n  correctly-cited prefixed surname can rarely produce a Tier A false positive.\n- No-DOI entries are resolved by best-effort bibliographic search and flagged\n  low-confidence — treat those as hints, not verdicts.\n- Some preprints, datasets, and protocols carry no author metadata in CrossRef and\n  surface as Tier U rather than a mismatch.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eRelated work \u0026amp; FAQ\u003c/b\u003e\u003c/summary\u003e\n\nghostcite's niche is **deterministic, no-LLM, CLI-first** checking focused on the\n**byline-mismatch** failure mode (right DOI, wrong author/year) plus **retraction**\nflagging — built to run unattended in CI.\n\n| Tool                                                            | What it does                                | How ghostcite differs                                                       |\n| --------------------------------------------------------------- | ------------------------------------------- | --------------------------------------------------------------------------- |\n| [RefChecker](https://github.com/markrussinovich/refchecker)     | LLM-powered web-search reference validator  | ghostcite is no-LLM, deterministic, and CI-safe (no model, no API key)      |\n| claude-skill-citation-checker                                   | A Claude Code skill for an LLM agent        | ghostcite is a standalone CLI + Action — no agent or LLM host needed        |\n| [BibTeX Verifier](https://merfanian.github.io/Bibtex-Verifier/) | In-browser BibTeX checker                   | ghostcite is scriptable from the CLI and also flags retractions             |\n| [CERCA](https://github.com/lidianycs/cerca)                     | Java / AGPL citation checker                | ghostcite is Python / MIT / `pip install`-able                              |\n| [scite Reference Check](https://scite.ai/)                      | Commercial, PDF-oriented, retraction focus  | ghostcite is free / open-source, BibTeX-native, and catches byline mismatch |\n| [doimgr](https://github.com/dotcs/doimgr)                       | Formats and manages DOIs (doesn't validate) | ghostcite verifies byline and retraction status, not just formatting        |\n\n**Does it call an LLM?** No — a deterministic comparison of the metadata you wrote\nagainst CrossRef's (and optionally PubMed's) canonical record. No model, no prompt,\nno API key required.\n\n**Will it hit rate limits?** It self-throttles to CrossRef's advertised rate limit\n(read from the live response headers); use `--max-rps` to be more conservative.\n\n**Does it catch fabricated DOIs?** Indirectly — a DOI that 404s at CrossRef\nsurfaces as Tier U. The core check is byline-vs-DOI _consistency_, so it catches the\ncommon case of a real DOI attached to the wrong citation.\n\n\u003c/details\u003e\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmusharna%2Fghostcite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmusharna%2Fghostcite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmusharna%2Fghostcite/lists"}