{"id":32155581,"url":"https://github.com/richardkwo/invariantcausal.jl","last_synced_at":"2025-10-21T12:07:00.995Z","repository":{"id":61798349,"uuid":"132090137","full_name":"richardkwo/InvariantCausal.jl","owner":"richardkwo","description":"Causal Inference with Invariant Prediction","archived":false,"fork":false,"pushed_at":"2020-12-07T19:14:15.000Z","size":2254,"stargazers_count":25,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-01-20T19:25:49.821Z","etag":null,"topics":["causal-inference","causality","graphical-models","intervention","julia","sem","structural-equation-modeling"],"latest_commit_sha":null,"homepage":null,"language":"Julia","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/richardkwo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-05-04T05:09:33.000Z","updated_at":"2024-02-08T03:28:25.000Z","dependencies_parsed_at":"2022-10-21T11:15:11.224Z","dependency_job_id":null,"html_url":"https://github.com/richardkwo/InvariantCausal.jl","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/richardkwo/InvariantCausal.jl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richardkwo%2FInvariantCausal.jl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richardkwo%2FInvariantCausal.jl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richardkwo%2FInvariantCausal.jl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richardkwo%2FInvariantCausal.jl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/richardkwo","download_url":"https://codeload.github.com/richardkwo/InvariantCausal.jl/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richardkwo%2FInvariantCausal.jl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280256415,"owners_count":26299380,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-21T02:00:06.614Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["causal-inference","causality","graphical-models","intervention","julia","sem","structural-equation-modeling"],"created_at":"2025-10-21T12:06:57.089Z","updated_at":"2025-10-21T12:07:00.984Z","avatar_url":"https://github.com/richardkwo.png","language":"Julia","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Causal Inference with Invariant Prediction\n\n[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](LICENSE.md) [![Build Status](https://travis-ci.org/richardkwo/InvariantCausal.jl.svg?branch=master)](https://travis-ci.org/github/richardkwo/InvariantCausal.jl) [![codecov](https://codecov.io/gh/richardkwo/InvariantCausal.jl/branch/master/graph/badge.svg?token=GAKgUhTT6E)](https://codecov.io/gh/richardkwo/InvariantCausal.jl)\n\n![college](docs/college.png)\n\nThis is a **Julia 1.x** implementation for the **Invariant Causal Prediction** algorithm of [Peters, Bühlmann and Meinshausen](https://doi.org/10.1111/rssb.12167). The method uncovers direct causes of a target variable from datasets under different environments (e.g., interventions or experimental settings). \n\nSee also this [R package](https://cran.r-project.org/package=InvariantCausalPrediction) and [this report](docs/InvariantCausal.pdf).\n\n#### Changelog\n\n- 2020/12/03: version 1.0.0 (Julia 1.x)\n- 2018/06/20: version 0.1.1 (Julia 0.6)\n\n#### Dependencies\n\n[DataStructures.jl](https://github.com/JuliaCollections/DataStructures.jl), [StatsBase.jl](https://github.com/JuliaStats/StatsBase.jl), [GLM.jl](https://github.com/JuliaStats/GLM.jl), [DataFrames.jl](https://github.com/JuliaData/DataFrames.jl), [GLMNet.jl](https://github.com/JuliaStats/GLMNet.jl) (for lasso screening and requires `gfortran`) and [UnicodePlots.jl](https://github.com/Evizero/UnicodePlots.jl).\n\n### Installation\n\nInstall the package via typing the following in Julia REPL.\n\n```julia\njulia\u003e using Pkg\njulia\u003e Pkg.add(\"InvariantCausal\")\n```\n\nAlternatively, you can install the latest from GitHub.\n\n```Julia\njulia\u003e Pkg.add(url=\"https://github.com/richardkwo/InvariantCausal.git\")\n```\n\nUse the following to run a full test.\n\n```julia\njulia\u003e using InvariantCausal\njulia\u003e InvariantCausal._test_full()\n```\n\n### Quick Start\n\nGenerate a simple [Gaussian structure equation model](https://en.wikipedia.org/wiki/Structural_equation_modeling?oldformat=true) (SEM) with random graph with 21 variables and average degree 3. Note that we assume the SEM is acyclic. The model can be represented as `X = B X + ϵ` with zeros on the diagonals of B (no self-loop). `ϵ` is a vector of independent Gaussian errors. For a variable `i`, variables `j` with coefficients `B[i,j]` non-zero are called the direct causes of `i`. We assume `B` is sparse, and its sparsity pattern is visualized with [UnicodePlots.jl](https://github.com/Evizero/UnicodePlots.jl).\n\n```julia\njulia\u003e using InvariantCausal\njulia\u003e using Random\njulia\u003e Random.seed!(77)\njulia\u003e sem_obs = random_gaussian_SEM(21, 3)\n\nGaussian SEM with 21 variables:\nB =\n      Sparsity Pattern\n      ┌───────────┐\n    1 │⠀⠠⠀⠀⢐⠀⠀⠄⠀⢔⠀│ \u003e 0\n      │⠠⠀⠠⠨⠁⠀⠄⠀⠀⠸⠀│ \u003c 0\n      │⠠⠈⠈⠀⠌⠠⠀⠅⠀⠩⠉│\n      │⠠⣨⠴⠰⠪⠠⠄⠀⠸⠉⣐│\n      │⢀⠲⠈⢠⠠⠀⠀⠂⠀⠲⠁│\n   21 │⠀⠐⠀⠀⠠⠠⠀⠀⠀⠔⠀│\n      └───────────┘\n      1          21\n        nz = 70σ² = [1.9727697778060356, 1.1224733663047743, 1.1798805640594814, 1.2625825149076064, 0.8503782631176267, 0.5262963446298372, 1.3835334059064883, 1.788996301274282, 1.759286517329432, 0.842571682652995, 1.713382150423666, 1.4524484793202235, 1.9464648511794784, 1.7729995603828317, 0.7110857327642559, 1.6837378902964577, 1.085405687408806, 1.3069888003095986, 1.3933773717634643, 1.0571823834646068, 1.9187793877731028]\n```\n\nSuppose we want to infer the direct causes for the last variables, i.e., 9, 11 and 18.\n\n```julia\njulia\u003e causes(sem_obs, 21)\n3-element Array{Int64,1}:\n  9\n 11\n 18\n```\n\nFirstly, let us generate some observational data and call it **environment 1**.\n\n```julia\njulia\u003e X1 = simulate(sem_obs, 1000)\n```\n\nThen, we simulate from **environment 2** by performing **do-intervention** on variables 3, 4, 5, 6. Here we set them to fixed random values.\n\n```julia\njulia\u003e X2 = simulate(sem_obs, [3,4,5,6], randn(4), 1000)\n```\n\nWe run the algorithm on **environments 1 and 2**.\n\n```julia\njulia\u003e causalSearch(vcat(X1, X2)[:,1:20], vcat(X1, X2)[:,21], repeat([1,2], inner=1000))\n\n8 variables are screened out from 20 variables with lasso: [5, 7, 8, 9, 11, 12, 15, 17]\nCausal invariance search across 2 environments with at α=0.01 (|S| = 8, method = chow, model = linear)\n\nS = []                                      : p-value = 0.0000 [ ] ⋂ = [5, 7, 8, 9, 11, 12, 15, 17]\nS = [5]                                     : p-value = 0.0000 [ ] ⋂ = [5, 7, 8, 9, 11, 12, 15, 17]\nS = [17]                                    : p-value = 0.0000 [ ] ⋂ = [5, 7, 8, 9, 11, 12, 15, 17]\nS = [15]                                    : p-value = 0.0000 [ ] ⋂ = [5, 7, 8, 9, 11, 12, 15, 17]\nS = [12]                                    : p-value = 0.0000 [ ] ⋂ = [5, 7, 8, 9, 11, 12, 15, 17]\nS = [11]                                    : p-value = 0.0144 [*] ⋂ = [11]\nS = [9]                                     : p-value = 0.0000 [ ] ⋂ = [11]\nS = [8]                                     : p-value = 0.0000 [ ] ⋂ = [11]\nS = [7]                                     : p-value = 0.0000 [ ] ⋂ = [11]\nS = [11, 5]                                 : p-value = 0.0000 [ ] ⋂ = [11]\nS = [11, 12]                                : p-value = 0.0000 [ ] ⋂ = [11]\nS = [11, 15]                                : p-value = 0.0007 [ ] ⋂ = [11]\nS = [7, 11]                                 : p-value = 0.0082 [ ] ⋂ = [11]\nS = [11, 8]                                 : p-value = 0.0000 [ ] ⋂ = [11]\nS = [9, 11]                                 : p-value = 0.0512 [*] ⋂ = [11]\nS = [17, 11]                                : p-value = 0.0000 [ ] ⋂ = [11]\nS = [9, 12]                                 : p-value = 0.0000 [ ] ⋂ = [11]\nS = [9, 15]                                 : p-value = 0.0064 [ ] ⋂ = [11]\nS = [7, 9]                                  : p-value = 0.0000 [ ] ⋂ = [11]\nS = [9, 8]                                  : p-value = 0.0000 [ ] ⋂ = [11]\nS = [9, 5]                                  : p-value = 0.7475 [*] ⋂ = Int64[]\n\nTested 21 sets: 3 sets are accepted.\n\n * Found no causal variable (empty intersection).\n\n ⋅ Variables considered include [5, 7, 8, 9, 11, 12, 15, 17]\n```\n\nThe algorithm **cannot find any** direct causal variables (parents) of variable 21 due to **insufficient power** of two environments. The algorithm tends to **discover more** with **more environments**. Let us define a new environment where we perform a **noise (soft) intervention** that changes the equations for 5 variables other than the target. Note it is important that the **target** is left **untouched**.\n\n```Julia\njulia\u003e sem_noise, variables_intervened = random_noise_intervened_SEM(sem_obs, p_intervened=5, avoid=[21])\n\n(Gaussian SEM with 21 variables:\nB =\n      Sparsity Pattern\n      ┌───────────┐\n    1 │⠀⠠⠀⠀⢐⠀⠀⠄⠀⢔⠀│ \u003e 0\n      │⠠⠀⠠⠨⠁⠀⠄⠀⠀⠸⠀│ \u003c 0\n      │⠠⠈⠈⠀⠌⠠⠀⠅⠀⠩⠉│\n      │⠠⣨⠴⠰⠪⠠⠄⠀⠸⠉⣐│\n      │⢀⠲⠈⢠⠠⠀⠀⠂⠀⠲⠁│\n   21 │⠀⠐⠀⠀⠠⠠⠀⠀⠀⠔⠀│\n      └───────────┘\n      1          21\n        nz = 70σ² = [1.9727697778060356, 1.1224733663047743, 1.1798805640594814, 1.2625825149076064, 0.8503782631176267, 0.5262963446298372, 1.3835334059064883, 1.788996301274282, 1.759286517329432, 0.5837984015051159, 3.01957479564807, 0.9492838187140921, 1.9398913901673531, 1.7729995603828317, 0.7110857327642559, 1.6837378902964577, 1.2089053651343495, 1.3069888003095986, 1.3933773717634643, 1.0571823834646068, 1.9187793877731028], [17, 13, 10, 11, 12])\n```\n\nHere the equations for variables 17, 13, 10, 11, 12 have been changed. Now we simulate from this modified SEM and call it **environment 3**. We run the algorithm on all **3 environments**.\n\n```Julia\njulia\u003e X3 = simulate(sem_noise, 1000)\njulia\u003e causalSearch(vcat(X1, X2, X3)[:,1:20], vcat(X1, X2, X3)[:,21], repeat([1,2,3], inner=1000))\n```\n\nThe algorithm searches over subsets for a while and successfully **discovers** variables 11. The other two causes, 9 and 18, can hopefully be discovered given even more environments.\n\n```\ncausalSearch(vcat(X1, X2, X3)[:,1:20], vcat(X1, X2, X3)[:,21], repeat([1,2,3], inner=1000))\n8 variables are screened out from 20 variables with lasso: [4, 5, 7, 8, 9, 11, 12, 16]\nCausal invariance search across 3 environments with at α=0.01 (|S| = 8, method = chow, model = linear)\n\nS = []                                      : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [4]                                     : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [16]                                    : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [12]                                    : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [11]                                    : p-value = 0.0084 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [9]                                     : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [8]                                     : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [7]                                     : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [5]                                     : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [4, 11]                                 : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [11, 5]                                 : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [11, 8]                                 : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [7, 11]                                 : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [9, 11]                                 : p-value = 0.0000 [ ] ⋂ = [4, 5, 7, 8, 9, 11, 12, 16]\nS = [16, 11]                                : p-value = 0.0709 [*] ⋂ = [11, 16]\nS = [11, 12]                                : p-value = 0.0000 [ ] ⋂ = [11, 16]\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t...\nS = [7, 9, 4, 16, 11, 5, 12]                : p-value = 0.0000 [ ] ⋂ = [11]\nS = [7, 9, 4, 16, 11, 8, 12]                : p-value = 0.0001 [ ] ⋂ = [11]\nS = [7, 4, 9, 16, 11, 5, 8, 12]             : p-value = 0.0002 [ ] ⋂ = [11]\n\nTested 256 sets: 6 sets are accepted.\n\n * Causal variables include: [11]\n\n   variable   \t 1.0 % \t\t 99.0 %\n   11         \t 0.1123 \t 1.1017\n\n ⋅ Variables considered include [4, 5, 7, 8, 9, 11, 12, 16]\n```\n\n### Functionalities\n\n- The main algorithm `causalSearch(X, y, env, [S]; α=0.01, method=\"chow\", screen=\"auto\", p_max=8, verbose=true, selection_only=false, n_max_for_exact=5000)` \n  - Performs screening if number of covariates exceeds `p_max`\n    - `screen=\"auto\"`: `\"HOLP\"` when p \u003e n, `\"lasso\"` otherwise\n    - `screen=\"HOLP\"`: [High dimensional ordinary least squares projection for screening variables](https://doi.org/10.1111/rssb.12127) when p ≧ n\n    - `screen=\"lasso\"`: lasso solution path from `glmnet`\n  - Skips supersets of an accepted set under `selection_only = true`, but confidence intervals are not reported\n  - When sample size exceeds `n_max_for_exact`, sub-sampling is used for Chow test\n- Methods\n  - `method=\"chow\"`: Chow test for linear regression\n  - `method=\"logistic-LR\"`: likelihood-ratio test for logistic regression\n  - `method=\"logistic-SF\"`: [Sukhatme-Fisher test](http://www.jstor.org/stable/2286870) for testing equal mean and variance of logistic prediction residuals\n- SEM utilities: `random_gaussian_SEM`, `random_noise_intervened_SEM`, `simulate`, `causes` and `cov` for generating random SEM (Erdos-Renyi), simulation and interventions.\n- Variables screening:\n  - Lasso (with `glmnet`): `screen_lasso(X, y, pmax)`\n\n###  Features\n\n- High performance implementation in Julia v1.x\n- Faster search: \n  - skipping testing supersets of A if A is accepted ( under  `selection_only` mode)\n  - Priority queue to prioritize testing sets likely to be invariant\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frichardkwo%2Finvariantcausal.jl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frichardkwo%2Finvariantcausal.jl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frichardkwo%2Finvariantcausal.jl/lists"}