https://github.com/statmlben/nonlinear-causal

nl-causal: nonlinear causal inference based on IV regression in Python
https://github.com/statmlben/nonlinear-causal

2sls causal-inference gwas instrumental-variables nonlinear

Last synced: 5 months ago
JSON representation

nl-causal: nonlinear causal inference based on IV regression in Python

Host: GitHub
URL: https://github.com/statmlben/nonlinear-causal
Owner: statmlben
License: mit
Created: 2021-03-16T04:44:28.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2024-07-09T04:07:24.000Z (almost 2 years ago)
Last Synced: 2025-10-27T11:46:02.058Z (8 months ago)
Topics: 2sls, causal-inference, gwas, instrumental-variables, nonlinear
Language: Python
Homepage: https://github.com/nl-causal/nonlinear-causal
Size: 35.7 MB
Stars: 16
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

           ![Pypi](https://badge.fury.io/py/nonlinear-causal.svg)

[![Python](https://img.shields.io/badge/python-3-blue.svg)](https://www.python.org/)

[![MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

# 🧬 nonlinear-causal 

**nonlinear-causal** is a Python module for nonlinear causal inference, including **hypothesis testing** and **confidence interval** for causal effect, built on top of instrument variables and Two-Stage least squares ([2SLS](https://en.wikipedia.org/wiki/Instrumental_variables_estimation)). 

- GitHub repo: [https://github.com/nl-causal/nonlinear-causal](https://github.com/nl-causal/nonlinear-causal)

- PyPi: [https://pypi.org/project/nonlinear-causal/](https://pypi.org/project/nonlinear-causal/)

- Paper: [PMLR@CLeaR2024](https://proceedings.mlr.press/v236/dai24a/dai24a.pdf)

- Documentation: [https://nonlinear-causal.readthedocs.io](https://nonlinear-causal.readthedocs.io/en/latest/index.html)

## Models

**nonlinear-causal** considers two instrument variable causal models:



Illustrated by the above image example, let's denote $\mathbf{z}$ as the valid/invalid instrument variables (such as SNPs), $x$ as the exposure (such as gene expression), and $y$ as the outcome (such as AD). 

### **Two-Stage least squares ([2SLS](https://doi.org/10.1080/01621459.2014.994705))**

$$

x = \mathbf{z}^\prime \mathbf{\theta} + w, \quad y = \beta x + \mathbf{z}^\prime \mathbf{\alpha} + \epsilon,

$$

where $(w,\varepsilon)$ are the error terms independent of the instruments $\mathbf{z}$, however, $w$ and $\varepsilon$ may be correlated due to underlying *confounders*, and $\beta\in\mathbb{R}$, $\mathbf{\alpha}\in\mathbb{R}^p$, $\mathbf{\theta}\in\mathbb{R}^p$ are unknown parameters.

### **Two-Stage Sliced Inverse Regression ([2SIR](https://openreview.net/pdf?id=cylRvJYxYI))** 

$$

\phi(x) = \mathbf{z}^\prime \mathbf{\theta} + w, \quad y = \beta \phi(x) + \mathbf{z}^\prime \mathbf{\alpha} + \epsilon,

$$

where $(w,\varepsilon)$ are the error terms independent of the instruments $\mathbf{z}$, however, $w$ and $\varepsilon$ may be correlated due to underlying *confounders*, and $\beta\in\mathbb{R}$, $\mathbf{\alpha}\in\mathbb{R}^p$, $\mathbf{\theta}\in\mathbb{R}^p$ are unknown parameters.

**Remarks**

- **2SLS / 2SIR.** $\mathbf{\alpha} \neq \mathbf{0}$ indicates the violation of the second and/or third IV assumptions. The models may not be identifiable with the presence of invalid IVs. In the literature, additional structural constraints are imposed to avoid this issue, such as $\|\mathbf{\alpha}\|_0 < p/2$.

- **2SIR.** $\beta$ and $\phi$ are identifiable by fixing $\|\mathbf{\theta}\|_2 = 1$ and $\beta \geq 0$.

**Strengths** of **2SIR**

- Model assumptions of 2SIR are weaker than the classical 2SLS: the model admits an *arbitrary* nonlinear transformation $\phi(\cdot)$ across $\mathbf{z}$, $x$ and $y$, relaxing the linearity assumption in the standard TWAS/2SLS.

- 2SIR includes 2SLS and Yeo-Johnson power transformation 2SLS (PT-2SLS) as special cases. It is worth mentioning that the proposed method remains competitive against 2SLS/PT-2SLS even if the linear assumption holds.

- The implicit linear structure in both 2SLS and 2SIR allows the *use of GWAS summary data* of our method, in contrast to requiring individual-level data by the other (non-linear) models.

## What We Can Do:

**2SLS**

- Estimate $\beta$: marginal causal effect from $X \to Y$

- Hypothesis testing (HT) and confidence interval (CI) for marginal causal effect $\beta$.

**2SIR**

- Estimate $\beta$: marginal causal effect from $X \to Y$

- Hypothesis testing (HT) and confidence interval (CI) for marginal causal effect $\beta$.

- Estimate nonlinear causal link $\phi(\cdot)$.

For implementation usage of **nonlinear_causal**, kindly refer to the provided examples and notebooks.

## Installation

```bash

# Install the latest version `nonlinear-causal` in Github:

pip install git+https://github.com/nl-causal/nonlinear-causal

# or Install `nonlinear-causal` lib from `pypi`

pip install nonlinear-causal

```

## Examples and notebooks

- [User guide](./md/user_guide.md)

- [Simulated examples](https://colab.research.google.com/drive/1c7nzsh5lFY6zaKB0LmP_9z6BZJ7m5F-H?usp=sharing)

- [Simulated examples with invalid IVs](https://colab.research.google.com/drive/1PTw8VIH3ygvTkQZU0aI23Imh48aWDNMT?usp=sharing)

- [Real application](app_test.ipynb)

## Simulation Performance

- We examine four cases: (i) $\beta = 0$, (ii) $\beta = .05$, (iii) $\beta = .10$, (iv) $\beta = .15$. Note that case (i) is for Type I error analysis, while $\beta > 0$ in (ii) - (iv), suggests power analysis. 

- Six transformations are considered: (1) linear: $\phi(x) = x$; (2) logarithm: $\phi(x) = \log(x)$; (3) cube root: $\phi(x) = x^{1/3}$; (4) inverse: $\phi(x) = 1/x$; (5) piecewise linear: $\phi(x) = xI(x\leq 0) + 0.5 x I(x > 0)$; (6) quadratic: $\phi(x) = x^2$.  

![result](./figs/sim_test_n5p10.png)

For more information, please check [our paper](https://openreview.net/pdf?id=cylRvJYxYI) (Section 3) or the [Jupyer Notebook](./nb/sim_main.ipynb) for the simulation examples.

## Reference

If you use this code please star 🌟 the repository and cite the following paper:

- Dai, B., Li, C., Xue, H., Pan, W., & Shen, X. (2024). Inference of nonlinear causal effects with GWAS summary data. In *Conference on Causal Learning and Reasoning*. PMLR.

```latex

@inproceedings{dai2022inference,

  title={Inference of nonlinear causal effects with GWAS summary data},

  author={Dai, Ben and Li, Chunlin and Xue, Haoran and Pan, Wei and Shen, Xiaotong},

  booktitle={Conference on Causal Learning and Reasoning},

  pages={},

  year={2024},

  rganization={PMLR}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/statmlben/nonlinear-causal

Awesome Lists containing this project

README