Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rguo12/awesome-causality-data
A data index for learning causality.
https://github.com/rguo12/awesome-causality-data
List: awesome-causality-data
Last synced: 13 days ago
JSON representation
A data index for learning causality.
- Host: GitHub
- URL: https://github.com/rguo12/awesome-causality-data
- Owner: rguo12
- License: mit
- Created: 2018-08-22T22:28:21.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2023-10-25T12:59:45.000Z (about 1 year ago)
- Last Synced: 2024-05-21T02:24:14.358Z (6 months ago)
- Homepage:
- Size: 31.3 KB
- Stars: 413
- Watchers: 13
- Forks: 65
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-causality - awesome-causality-data
README
# awesome-causality-data
An index of datasets that can be used for learning causality.Please cite our survey if this data index helps your research.
```
@article{guo2018survey,
title={A Survey of Learning Causality with Data: Problems and Methods},
author={Guo, Ruocheng and Cheng, Lu and Li, Jundong and Hahn, P. Richard and Liu, Huan},
journal={arXiv preprint arXiv:1809.09337},
year={2018}
}
```*Updates coming soon*
## Datasets for Learning Causal Effects (Causal Inference)
### Causal Effect Estimation with Single Cause
#### Datasets with i.i.d. samples
Standard datasets for learning causal effects comes with each instance in the format of (**x**,d,y).[IHDP1](https://github.com/AMLab-Amsterdam/CEVAE/tree/master/datasets/IHDP)
[How is IHDP1 (setting A) simulated](https://github.com/vdorie/npci/tree/master/examples/ihdp_sim)
[IHDP2](https://math.la.asu.edu/~prhahn/)
[Twins](https://github.com/AMLab-Amsterdam/CEVAE/tree/master/datasets/TWINS)
[Job Training](http://users.nber.org/~rdehejia/data/nswdata2.html) ([Lalonde 1986 in the R package qte](https://rdrr.io/cran/qte/man/lalonde.html#heading-0))
[ACIC Benchmark](https://github.com/vdorie/aciccomp/tree/master/2016)
[News](https://github.com/d909b/perfect_match/tree/master/perfect_match/data_access/news)
[TCGA](https://github.com/d909b/perfect_match/tree/master/perfect_match/data_access/tcga)
[NLSM](https://github.com/grf-labs/grf/tree/master/experiments/acic18)
#### Datasets with non-i.i.d. samples (with interference, spillover effect or auxiliary network information)
[Amazon](https://drive.google.com/drive/u/1/folders/1Ff_GdfjhrDFbZiRW0z81lGJW-cUrYmo1)
#### Datasets with instrumental Variables (IV)
Standard datasets for learning causal effects, each instance has the format of (i,**x**,d,y).[1980 Census Extract](https://economics.mit.edu/faculty/angrist/data1/data/angkru95)
[CPS Extract](https://economics.mit.edu/faculty/angrist/data1/data/angkru95)
#### Datasets for Regression Discontinuity Design
[Population Threshold RDD Datasets](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/PGXO5O)
### Datasets with Multiple Causes
## Datasets for Learning Causal Relationships (Causal Discovery)
#### Distinguishing Cause from Effect
[Database with cause-effect pairs (Tbingen Cause-Effect Pairs)](http://webdav.tuebingen.mpg.de/cause-effect/)[AntiCD3/CD28](https://science.sciencemag.org/content/308/5721/523)
[Pittsburgh Bridges](http://archive.ics.uci.edu/ml)
[Abalone](http://archive.ics.uci.edu/ml)
#### Causal Bayesian Network
[Lung Cancer Simple Set (LUCAS)](http://www.causality.inf.ethz.ch/data/LUCAS.html)## Datasets for Connections to Machine Learning
### Datasets with randomized test set for recommendation systems
|Name|Paper|URL|
|---|---|---|
|Coat|[Schnabel, Tobias, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. "Recommendations as treatments: Debiasing learning and evaluation." arXiv preprint arXiv:1602.05352 (2016).](http://www.jmlr.org/proceedings/papers/v48/schnabel16.pdf)|[download](http://www.cs.cornell.edu/~schnabts/mnar/index.html)|
|Yahoo! R3|[Schnabel, Tobias, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. "Recommendations as treatments: Debiasing learning and evaluation." arXiv preprint arXiv:1602.05352 (2016).](http://www.jmlr.org/proceedings/papers/v48/schnabel16.pdf)|[download](https://webscope.sandbox.yahoo.com/catalog.php?datatype=r)|
|Spotify Music Streaming Sessions|[Brost, Brian, Rishabh Mehrotra, and Tristan Jehan. "The Music Streaming Sessions Dataset." In The World Wide Web Conference, pp. 2594-2600. ACM, 2019.](https://arxiv.org/pdf/1901.09851)|[download](https://www.spotify.com/)|