{"id":30138392,"url":"https://github.com/jennalandy/causallfo","last_synced_at":"2025-08-11T01:07:30.355Z","repository":{"id":301807365,"uuid":"995633124","full_name":"jennalandy/causalLFO","owner":"jennalandy","description":"This R package provides all algorithms discussed in the paper “Causal Inference for Latent Outcomes Learned with Factor Models”. ","archived":false,"fork":false,"pushed_at":"2025-06-28T23:23:00.000Z","size":682,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-29T00:24:32.070Z","etag":null,"topics":["causal-inference","latent-factor-model","latent-outcomes","nonnegative-matrix-factorization","r-package"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jennalandy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-03T19:32:57.000Z","updated_at":"2025-06-28T23:23:04.000Z","dependencies_parsed_at":"2025-06-29T00:24:42.238Z","dependency_job_id":null,"html_url":"https://github.com/jennalandy/causalLFO","commit_stats":null,"previous_names":["jennalandy/causallfo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jennalandy/causalLFO","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jennalandy%2FcausalLFO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jennalandy%2FcausalLFO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jennalandy%2FcausalLFO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jennalandy%2FcausalLFO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jennalandy","download_url":"https://codeload.github.com/jennalandy/causalLFO/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jennalandy%2FcausalLFO/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269815077,"owners_count":24479486,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-10T02:00:08.965Z","response_time":71,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["causal-inference","latent-factor-model","latent-outcomes","nonnegative-matrix-factorization","r-package"],"created_at":"2025-08-11T01:07:26.339Z","updated_at":"2025-08-11T01:07:30.333Z","avatar_url":"https://github.com/jennalandy.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `causalLFO`: R package for Causal Inference for Latent Outcomes Learned with Factor Models\n\nThis R package provides all algorithms discussed in [the paper “Causal Inference for Latent Outcomes Learned with Factor Models”](https://arxiv.org/abs/2506.20549). Code to reproduce results from our paper can be found in the [jennalandy/causalLFO_PAPER](https://github.com/jennalandy/causalLFO_PAPER/tree/master) repository.\n\n## Installation\n\n``` r\nremotes::install_github(\"jennalandy/causalLFO\")\n```\n\n``` r\nlibrary(NMF)\nlibrary(causalLFO)\n```\n\n`NMF::nmf()` internally uses `setupLibPaths(\"NMF\")`, which calls `path.package(\"NMF\")`. This requires the NMF package to be attached, not just imported, so the user must library `NMF` as well as `causalLFO`.\n\nPlease install `NMF` if you have not yet done so. `NMF` requires the `Biobase` package, which may have to be installed separately from `Bioconductor`.\n\n## Quick Start\n\nThis code block simulates a simple dataset with 100 samples, three latent factors, and a true ATE of 1000 on the latent dimension 1, with ATE of 0 for dimensions 2 and 3. We include five outliers in the true untreated latent outcomes for factor 3.\n\n``` r\nlibrary(tidyverse)\nlibrary(ggridges)\n\nset.seed(321)\nN = 100; D = 96; K = 3; ATE = c(1000, 0, 0)\n\n# Simulate treatment assignment\nTr = sample(c(0, 1), N, replace = TRUE)\n\n# Simulate latent factors P\ntrue_P = matrix(rexp(D*K, rate = 1), nrow = D)\n# Normalize factors to sum to 1\ntrue_P = sweep(true_P, 2, colSums(true_P), '/')\n\n# Simulate untreated factor loadings C\ntrue_C = matrix(nrow = K, ncol = N)\ntrue_C[1,] \u003c- rgamma(N, shape = 1, scale = 1000) # larger scale for factor 1\ntrue_C[2,] \u003c- rexp(N, rate = 0.01)\ntrue_C[3,] \u003c- rexp(N, rate = 0.01)\ntrue_C[3,sample(1:N, 10)] \u003c- rnorm(10, mean = 1500, sd = 1000) # outliers for factor 3\ndata.frame(t(true_C)) %\u003e%\n  pivot_longer(1:K, names_to = 'k', values_to = 'C') %\u003e%\n  ggplot(aes(x = C, y = as.factor(k))) +\n  geom_density_ridges() +\n  theme_bw() +\n  labs(x = \"Untreated latent outcome distribution\", y = \"Latent dimension\")\n```\n\n![](README_files/figure-commonmark/unnamed-chunk-3-1.png)\n\n``` r\n# Add ATE to loadigns of treated samples\nfor (k in 1:K) {\n  true_C[k, Tr == 1] \u003c- true_C[k, Tr == 1] + ATE[k]\n}\n\n# Simulate M ~ Poisson(PC)\nM = matrix(nrow = D, ncol = N)\nfor (i in 1:N) {\n  M[,i] \u003c- rpois(D, lambda = true_P %*% true_C[,i])\n}\n```\n\n### Run impute and stabilize algorithm once to yield a point estimate.\n\nProviding a `reference_P` does not affect the algorithm, but aligns results at the end.\n\n``` r\nimpute_and_stabilize_res \u003c- impute_and_stabilize(\n  M, Tr, rank = 3, reference_P = true_P\n)\nclass(impute_and_stabilize_res)\n```\n\n```         \n[1] \"causalLFO_result\"\n```\n\n``` r\nsummary(impute_and_stabilize_res)\n```\n\n```         \n        ATE\n1 944.07472\n2  11.60620\n3 -52.84368\n```\n\n``` r\nplot(impute_and_stabilize_res)\n```\n\n![](README_files/figure-commonmark/unnamed-chunk-4-1.png)\n\nIf you have multiple sets of results, they can be plotted together with `plot_causalLFO_results`. This could be from multiple algorithms as we have here, or alternatively from multiple datasets. This only makes sense when the same `reference_P` is used for all results. If a reference is not available, the resulting `Phat` from the first result.\n\n``` r\nall_data_res \u003c- all_data(\n  M, Tr, rank = 3, reference_P = true_P\n)\nres_list \u003c- list(\n  'All Data' = all_data_res,\n  'Impute and Stabilize' = impute_and_stabilize_res\n)\nplot_causalLFO_results(res_list)\n```\n\n![](README_files/figure-commonmark/unnamed-chunk-5-1.png)\n\n### Run impute and stabilize algorithm with bootstrap resampling to estimate a 95% confidence interval.\n\nWhen `bootstrap = TRUE`, any of the `causalLFO` algorithms will create three files named according to the `bootstrap_filename` parameter: `examples/impute_and_stabilize.csv` with ATE estimates from each of the 500 bootstrap replicates, `examples/impute_and_stabilize_aligned_Ps.rds` with a list of all 500 aligned factor matrices. We also choose to save the `res` object to a separate `.rds` file for easy access at a later time, and `examples/impute_and_stabilize_res.rds` with the full results object that is also returned by the function\n\n``` r\nimpute_and_stabilize_bootstrap_res \u003c- impute_and_stabilize(\n  M, Tr, rank = 3, reference_P = true_P,\n  bootstrap = TRUE, bootstrap_reps = 30,\n  bootstrap_filename = \"examples/impute_and_stabilize\"\n  # small bootstrap_reps for demonstration purposes only\n  # we recommend default bootstrap_reps = 500\n)\n```\n\nWhen `bootstrap = TRUE`, the `class` is changed from `causalLFO_result` to `causalLFO_bootstrap_result`, resulting in updated `summary` and `plot` methods:\n\n``` r\nimpute_and_stabilize_bootstrap_res \u003c- readRDS(\"examples/impute_and_stabilize_res.rds\")\nclass(impute_and_stabilize_bootstrap_res)\n```\n\n```         \n[1] \"causalLFO_bootstrap_result\"\n```\n\n``` r\nsummary(impute_and_stabilize_bootstrap_res)\n```\n\n```         \n        mean      lower      upper\n1 1002.70616  763.47986 1384.10774\n2   10.74524  -27.91742   42.69258\n3  -33.87980 -132.16468   77.48332\n```\n\n``` r\nplot(impute_and_stabilize_bootstrap_res)\n```\n\n![](README_files/figure-commonmark/unnamed-chunk-7-1.png)\n\nAgain, multiple sets of results can be plotted together with `plot_causalLFO_bootstrap_results`.\n\n``` r\nall_data_bootstrap_res \u003c- all_data(\n  M, Tr, rank = 3, reference_P = true_P,\n  bootstrap = TRUE, bootstrap_reps = 30,\n  bootstrap_filename = \"examples/all_data\"\n  # small bootstrap_reps for demonstration purposes only\n  # we recommend default bootstrap_reps = 500\n)\n```\n\nComparing the All Data and Impute and Stabilize algorithms, recall that the true ATE is 1000 for latent dimension 1 and 0 for dimensions 2 and 3. We see:\n\n-   Improved efficiency of Impute and Stabilize, narrower confidence intervals on factors 2 and 3 (especially factor 3 which has outliers in the data generating model)\n-   Impute and Stabilize corrects the All Data algorithm’s biased estimates for factors 1 and 3\n\n``` r\nall_data_bootstrap_res \u003c- readRDS(\"examples/all_data_res.rds\")\nsummary(all_data_bootstrap_res)\n```\n\n```         \n        mean      lower      upper\n1  852.34174  580.46693 1086.67511\n2   12.08584  -40.73959   81.13129\n3 -140.59002 -339.77022   17.31335\n```\n\n``` r\nres_list \u003c- list(\n  'All Data' = all_data_bootstrap_res,\n  'Impute and Stabilize' = impute_and_stabilize_bootstrap_res\n)\nplot_causalLFO_bootstrap_results(res_list)\n```\n\n![](README_files/figure-commonmark/unnamed-chunk-9-1.png)\n\n## Algorithms\n\nNovel algorithm from “Causal Inference for Latent Outcomes Learned with Factor Models”:\n\n-   **Impute and Stabilize** algorithm to estimate ATE on latent factor-modeled outcomes. Imputes counterfactual outcomes under Poisson distributional assumptions, fits NMF on untreated data (mix of observed and imputed), a Poisson non-negative linear model on treated data, then estimates ATE as the mean difference in estimated latent outcomes between treated and untreated.\n\nAblations of Impute and Stabilize:\n\n-   **Impute** algorithm to estimate ATE on latent factor-modeled outcomes. Imputes counterfactual outcomes under Poisson distributional assumptions, fits NMF on observed data, a Poisson non-negative linear model on imputed data, then estimates ATE as the mean difference in estimated latent outcomes between treated and untreated. *Intended as an ablation of impute_and_stabilize and not recommended by the authors.*\n-   **Stabilize** algorithm to estimate ATE on latent factor-modeled outcomes. Fits NMF on untreated samples, a Poisson non-negative linear model on treated samples, then estimates ATE using estimated latent outcomes. *Intended as an ablation of impute_and_stabilize and not recommended by the authors.*\n\nBaseline Algorithms:\n\n-   **All Data** algorithm to estimate ATE on latent factor-modeled outcomes. Fits NMF on all data, then estimates ATE from estimated latent outcomes. *Subject to measurement interference and not recommended by the authors.*\n-   **Random Split** algorithm to estimate ATE on latent factor-modeled outcomes. Fits NMF on a subset of data, a Poisson non-negative linear model on the rest with fixed factors, then estimates ATE from estimated latent outcomes in the second subset. *Subject to measurement interference and not recommended by the authors.*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjennalandy%2Fcausallfo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjennalandy%2Fcausallfo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjennalandy%2Fcausallfo/lists"}