{"id":51428186,"url":"https://github.com/heidihelena/recoverlite","last_synced_at":"2026-07-05T02:30:32.284Z","repository":{"id":369325188,"uuid":"1288476164","full_name":"heidihelena/recoverlite","owner":"heidihelena","description":"Pre-data recovery tests for planned study designs: simulate whether a planned design-analysis pair can recover its target estimand, with PASS/RISK/FAIL verdicts under versioned threshold profiles. Companion R package to the recovery-test methods paper.","archived":false,"fork":false,"pushed_at":"2026-07-04T17:01:37.000Z","size":83,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-07-04T19:03:23.728Z","etag":null,"topics":["clinical-trials","declaredesign","estimands","metascience","monte-carlo","power-analysis","preregistration","r","r-package","research-methods","rstats","simulation","statistics","study-design"],"latest_commit_sha":null,"homepage":"https://heidihelena.r-universe.dev/recoverlite","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/heidihelena.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":".zenodo.json","notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-07-03T16:29:16.000Z","updated_at":"2026-07-04T17:01:41.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/heidihelena/recoverlite","commit_stats":null,"previous_names":["heidihelena/recoverlite"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/heidihelena/recoverlite","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heidihelena%2Frecoverlite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heidihelena%2Frecoverlite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heidihelena%2Frecoverlite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heidihelena%2Frecoverlite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/heidihelena","download_url":"https://codeload.github.com/heidihelena/recoverlite/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heidihelena%2Frecoverlite/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35141966,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-05T02:00:06.290Z","response_time":100,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clinical-trials","declaredesign","estimands","metascience","monte-carlo","power-analysis","preregistration","r","r-package","research-methods","rstats","simulation","statistics","study-design"],"created_at":"2026-07-05T02:30:31.640Z","updated_at":"2026-07-05T02:30:32.279Z","avatar_url":"https://github.com/heidihelena.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# recoverlite\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/heidihelena/recoverlite/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/heidihelena/recoverlite/actions/workflows/R-CMD-check.yaml)\n[![recoverlite status badge](https://heidihelena.r-universe.dev/badges/recoverlite)](https://heidihelena.r-universe.dev/recoverlite)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.21195920.svg)](https://doi.org/10.5281/zenodo.21195920)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE.md)\n\u003c!-- badges: end --\u003e\n\n**Pre-data recovery tests for planned study designs.**\n\nA planned study can be unable to support its intended inferential claim\neven when the researcher's substantive assumptions are correct. Sampling,\nmeasurement, missingness, assignment, and analysis may yield estimates\nthat are biased, poorly calibrated, unstable, or exaggerated conditional\non detection — and attrition and exclusions can change the estimand\nitself: a study may retain precision while quietly answering a different\nquestion. Conventional power analysis rarely diagnoses these failures.\n\n`recoverlite` is a prototype implementation of the **recovery test**, a\nstandardized pre-data simulation protocol (Andersen, 2026, working\ndraft). Researchers declare the estimand, data-generating assumptions,\ndata strategy, missing-data process, and analysis strategy before\nsimulation. Data are generated under a **crossed scenario grid** — null\nand target effects, each under declared and pessimistically perturbed\nnuisance assumptions — and the planned analysis is applied to each\nsimulated dataset. The report converts the diagnosands into a\n**PASS / RISK / FAIL** verdict under a pre-specified, versioned threshold\nprofile — a decision convention, not a validity classification.\n\nDesign declaration and data generation follow the\n[DeclareDesign](https://declaredesign.org) grammar; mixed-model answer\nstrategies use `lme4` with `lmerTest` (Satterthwaite) and `pbkrtest`\n(Kenward–Roger); `simr` is suggested for complementary GLMM power work.\n\n## Installation\n\n```r\n# from the r-universe (recommended):\ninstall.packages(\"recoverlite\",\n                 repos = c(\"https://heidihelena.r-universe.dev\",\n                           \"https://cloud.r-project.org\"))\n\n# or the development version straight from GitHub:\n# remotes::install_github(\"heidihelena/recoverlite\")\n```\n\nNot yet on CRAN.\n\n## The workflow in one block\n\n```r\nlibrary(recoverlite)\n\ndesign \u003c- declare_recovery(\n  target = target_estimand(\n    estimand = \"ITT mean difference at 12 weeks\",\n    scale    = \"latent-outcome standardized mean difference\",\n    sesoi    = 0.40\n  ),\n  data_strategy   = two_arm_trial(n_per_arm = 115, allocation = 0.5),\n  measurement     = measured_outcome(reliability = 0.70),\n  missingness     = attrition_model(rate = 0.15, mechanism = \"differential\"),\n  answer_strategy = planned_analysis(\n    estimator = \"linear_model\",\n    formula   = y_observed ~ treatment\n  )\n)\n\nresult \u003c- recovery_test(design, sims = 2000,\n                        scenarios = \"confirmatory_grid\", seed = 1)\n\nverdict(result)   # PASS / RISK / FAIL under the selected threshold profile,\n                  # recomputed under the shipped strict and lenient profiles\nreport(result)    # standalone recovery report; always travels with the verdict\n```\n\nCluster-randomized designs use `cluster_trial()` with a mixed-model or\ncluster-level answer strategy and an **explicit inference method** — with\nfew clusters, the inference method is not a detail, it is the design:\n\n```r\ndesign \u003c- declare_recovery(\n  target = target_estimand(\n    estimand = \"ITT mean difference in pupil outcome\",\n    scale    = \"student-level standardized mean difference\",\n    sesoi    = 0.40\n  ),\n  data_strategy = cluster_trial(n_clusters = 16, n_per_cluster = 30,\n                                icc = 0.05, icc_pessimistic = 0.15),\n  answer_strategy = planned_analysis(\n    estimator = \"lmm_random_intercept\",\n    formula   = y_observed ~ treatment + (1 | cluster),\n    inference = \"kenward_roger\"   # or \"satterthwaite\", \"wald_z\"\n  )\n)\n```\n\n## What the protocol fixes before results are known\n\n* **A crossed scenario grid.** Null-declared, Null-pessimistic,\n  Target-declared, Target-pessimistic. An analysis that misbehaves under\n  realistic nuisance conditions can show it in false-positive behavior as\n  easily as in power, so the null rows are required verdict rows. The\n  null world is stated exactly: when the declared missingness mechanism\n  displaces the analyzable contrast under the null, the rejection rate is\n  reported as the **target-null rejection rate** (false claims about the\n  target, partly induced by selection), not as pure test size.\n* **Estimand drift as a diagnosand.** Target bias decomposes exactly into\n  **estimator bias** (what the estimator does to the answer) plus\n  **estimand drift** (what the data strategy does to the question). An\n  unbiased estimator aimed at a displaced contrast is a design problem —\n  and resources repair precision, not drift.\n* **Pessimistic values by an evidence hierarchy.** Empirical ranges \u003e\n  prior-study ranges \u003e elicited ranges \u003e package defaults (attrition\n  ×1.5 capped, reliability −0.10, ICC at its upper plausible bound,\n  noncompliance +50%) — each labeled with its tier in the report. The\n  target effect is never shrunk automatically; effect-size fragility is a\n  separate curve (`effect_fragility()`), as are nuisance fragility curves\n  (`nuisance_fragility()`), both outside the verdict.\n* **A classified failure taxonomy.** Fatal errors / nonconvergence /\n  degenerate (singular, boundary) fits / diagnostic warnings, reported\n  separately. Fatal and nonconvergence always count against the failure\n  threshold; whether degenerate fits count is pre-specified in\n  `planned_analysis()`, and their marginal effect on coverage is\n  reported.\n* **Monte Carlo uncertainty in the verdict.** Every diagnosand carries an\n  MCSE (bootstrap for conditional diagnosands); conditional diagnosands\n  report contributing counts and are marked unstable below 200; a margin\n  within 2 MCSE of its threshold caps the verdict at RISK.\n* **Threshold profiles, not thresholds.** Shipped lenient / default /\n  strict profiles (and an estimation profile); the report shows the\n  signed margin to every threshold and recomputes the verdict under\n  strict and lenient. A verdict that flips across profiles is itself a\n  finding — the RISK category exists to hold it.\n\n| Verdict | Rule |\n|---|---|\n| **PASS** | All required thresholds met under all scenario rows the profile requires, every margin \u003e 2 MCSE. |\n| **RISK** | Passes declared-nuisance rows but fails a pessimistic row, **or** any margin within 2 MCSE, **or** a required conditional diagnosand too unstable to confirm. |\n| **FAIL** | Any required threshold fails under a declared-nuisance row — including an inflated target-null rejection rate. |\n\nDefault confirmatory profile: target-null rejection ≤ 1.25α, power ≥ .80\nat the SESOI, |target bias| ≤ .05Δ, coverage ≥ .925 (overcoverage \u003e .975\nflagged as inefficiency, not failure), Type S ≤ .01, Type M ≤ 1.50,\nmodel failure ≤ .01.\n\nA PASS is evidence about the instrument, not about the world.\n\n## Scope of the current prototype\n\n**Supported:** two-arm parallel trials with an observed baseline,\nclassical additive measurement error, MCAR or baseline-dependent (MAR)\ndifferential attrition, optional one-sided noncompliance; answer\nstrategies: complete-case linear model, baseline-adjusted multiple\nimputation (`mi_baseline_adjusted`); cluster-randomized parallel trials\nwith random-intercept mixed models (Wald z / Satterthwaite /\nKenward–Roger inference) and the cluster-level t-test.\n\n**Unsupported:** crossed and longitudinal random-effects structures,\nBayesian answer strategies, prediction models, and latent-variable\nmeasurement models (manuscript §5.4).\n\nThe scripts that reproduce the paper's worked examples are in\n[`inst/paper/`](inst/paper/). Agent-facing usage instructions are in\n[`SKILL.md`](SKILL.md).\n\n## Citation\n\n```r\ncitation(\"recoverlite\")\n```\n\n\u003e Andersen, H. H. (2026). *Recovery before data: pre-data simulation\n\u003e diagnosis of planned study designs.* Working paper; preprint\n\u003e forthcoming. https://github.com/heidihelena/recoverlite\n\nVersioned releases are archived on Zenodo:\n[doi:10.5281/zenodo.21195920](https://doi.org/10.5281/zenodo.21195920)\n(concept DOI, always resolves to the latest version).\n\nThe reusable methods sentence for preregistrations and grant\napplications:\n\n\u003e \"Design feasibility was evaluated using a pre-data recovery test, in\n\u003e which the planned design and analysis were simulated under declared\n\u003e assumptions and pessimistic perturbations to assess power, bias,\n\u003e coverage, precision, and model stability.\"\n\n## License\n\n[Apache License 2.0](LICENSE.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheidihelena%2Frecoverlite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fheidihelena%2Frecoverlite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheidihelena%2Frecoverlite/lists"}