{"id":13704224,"url":"https://github.com/salesforce/causalai","last_synced_at":"2025-04-07T13:08:25.115Z","repository":{"id":65569867,"uuid":"568936460","full_name":"salesforce/causalai","owner":"salesforce","description":"Salesforce CausalAI Library: A Fast and Scalable framework for Causal Analysis of Time Series and Tabular Data","archived":false,"fork":false,"pushed_at":"2023-11-10T19:29:38.000Z","size":9299,"stargazers_count":277,"open_issues_count":7,"forks_count":30,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-31T11:06:52.482Z","etag":null,"topics":["causality"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/salesforce.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-11-21T18:20:58.000Z","updated_at":"2025-03-14T02:04:02.000Z","dependencies_parsed_at":"2024-01-14T20:49:16.379Z","dependency_job_id":"267cf24c-4b93-489c-8b37-4b4fb311bafd","html_url":"https://github.com/salesforce/causalai","commit_stats":{"total_commits":30,"total_committers":3,"mean_commits":10.0,"dds":"0.19999999999999996","last_synced_commit":"8fa89a9a287ffdf5a56ce3ae02dedd23cb59f64f"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fcausalai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fcausalai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fcausalai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fcausalai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/salesforce","download_url":"https://codeload.github.com/salesforce/causalai/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247657281,"owners_count":20974345,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["causality"],"created_at":"2024-08-02T21:01:05.939Z","updated_at":"2025-04-07T13:08:25.085Z","avatar_url":"https://github.com/salesforce.png","language":"Jupyter Notebook","funding_links":[],"categories":["Tools","Causal Inference and Econometrics"],"sub_categories":["Causal Inference","Frontier Tools"],"readme":"# Salesforce CausalAI Library\n\n:fire::fire:**Updates**:fire::fire:\n\n- Added GES, LINGAM, GIN for tabular data causal discovery\n- Added markov blanket discovery algorithm (grow-shrink) for tabular data\n- Added support for heterogeneous data (mixed discrete and continuous variables)\n- Added benchmarking modules for tabular and time series data, that will help the research community benchmark new and existing causal discovery algorithms against various challenges in the data (graph sparsity, sample complexity, variable complexity, SNR, noise type, etc)\n- Root cause analysis for tabular and time series data\n\n## Table of Contents\n1. [Introduction](#introduction)\n1. [Comparison with Related Libraries](#comparison-with-related-libraries)\n1. [Causal Discovery](#causal-discovery)\n1. [Causal Inference](#causal-inference)\n1. [Installation](#installation)\n1. [Quick Tutorial](#quick-tutorial)\n1. [User Inferface](#user-inferface)\n1. [Documentation](#documentation)\n1. [Technical Report and Citing Salesforce CausalAI](#technical-report-and-citing-salesforce-causalai)\n\n## Introduction\n\n\u003cimg src=\"assets/causalai_pipeline.png\" width=\"800\"\u003e\n\nSalesforce CausalAI is an open-source Python library for causal analysis using observational data. It supports causal discovery and causal inference for tabular and time series data (see figure above), of discrete, continuous and heterogeneous data types. It also supports Markov Blanket discovery algorithms. This library includes algorithms that handle linear and non-linear causal relationship between variables, and uses multi-processing for speed-up. We also include a data generator capable of generating synthetic data with specified structural equation model for both the aforementioned data formats and types, that helps users control the ground-truth causal process while investigating various algorithms. CausalAI includes benchmarking modules for tabular and time series data, that users can use to compare different causal discovery algorithms, as well as evaluate the performance of a particular algorithm across datasets with different challenges. Specifically, users can evaluate the performance of causal discovery algorithms on synthetic data with varying graph sparsity, sample complexity, variable complexity, SNR, noise type, and max lag (time series data). Finally, we provide a user interface (UI) that allows users to perform causal analysis on data without coding. The goal of this library is to provide a fast and flexible solution for a variety of problems in the domain of causality.\n\nSome of the key features of CausalAI are:\n\n- **Algorithms**: Support for causal discovery, causal inference, Markov blanket discovery, and Root Cause Analysis (RCA).\n- **Data**: Causal analysis on tabular and time series data, of discrete, continuous and heterogeneous types.\n- **Missing Values**: Support for handling missing/NaN values in data.\n- **Data Generator**: A synthetic data generator that uses a specified structural equation model (SEM) for generating tabular and time series data. This can be used for evaluating and comparing different causal discovery algorithms since the ground truth values are known.\n- **Distributed Computing**: Use of multi-processing using the [Python Ray library](https://docs.ray.io/en/latest/), that can be optionally turned on by the user when dealing with large datasets or number of variables for faster compute.\n- **Targeted Causal Discovery**: In certain cases, we support targeted causal discovery, in which the user is only interested in discovering the causal parents of a specific variable of interest instead of the entire causal graph. This option reduces computational overhead.\n- **Visualization**: Visualize tabular and time series causal graphs.\n- **Prior Knowledge**: Incorporate any user provided partial prior knowledge about the causal graph in the causal discovery process.\n- **Benchmarking**: Benchmarking module for comparing different causal discovery algorithms, as well as evaluating the performance of a particular algorithm across datasets with different challenges.\n- **Code-free UI**: Provide a code-free user interface in which users may directly upload their data and perform their desired choice of causal analysis algorithm at the click of a button.\n\n### Comparison with Related Libraries\n\nThe table below provides a visual overview of how CausalAI's key features compare to other libraries for causal analysis.\n\n\u003cimg src=\"assets/causalai_comparison.png\" width=\"800\"\u003e\n\n### Causal Discovery\n\nWe support the following causal discovery algorithm categorized by their assumptions on whether hidden variables are allowed, whether the data is discrete or contiuous, and the type of noise in the data.\n\n\u003cimg src=\"assets/cd_algos.png\" width=\"600\"\u003e\n\nFor continuous data, PC algorithm and Grow-shrink support both linear and non-linear causal relationships (depending on the CI test used). All other algorithms support linear relationships.\n\n### Causal Inference\n\nWe support the following causal inference estimations for tabular and time series data of continuous and discrete types:\n  - **Average Treatment Effect (ATE)**: ATE aims to determine the relative expected difference in the value of Y when we intervene X to be x_t compared to when we intervene X to be x_c. Here x_t and x_c are respectively the treatment value and control value.\n\u003cimg src=\"assets/ate.png\" width=\"400\"\u003e\n\n  - **Conditional Average Treatment Effect (CATE)**: CATE is similar to ATE, except that in addition to intervetion on X, we also condition on some set of variables C taking value c. Notice here that X is intervened but C is not. \n\u003cimg src=\"assets/cate.png\" width=\"500\"\u003e\n\n  - **Counterfactual**: Counterfactuals aim at estimating the effect of an intervention on a specific instance or sample. Suppose we have a specific instance of a system of random variables (X_1, X_2,...,X_N) given by (X_1=x_1, X_2=x_2,...,X_N=x_N), then in a counterfactual, we want to know the effect an intervention (say) X_1=k would have had on some other variable(s) (say X_2), holding all the remaining variables fixed.\n\u003cimg src=\"assets/counterfactual.png\" width=\"550\"\u003e\n\nDepending on whether the relationship between variables is linear or non-linear, the user may specify a linear or non-linear prediction model respectively in the inference module.\n\n## Installation\n\nPrior to installing the library, create a conda environment with Python 3.9 or a later version. This can be done by executing ``conda create -n causal_ai_env python=3.9``. Activate this environment by executing ``conda activate causal_ai_env``. To install Salesforce CausalAI, git clone the library, go to the root directory of the repository, and execute ``pip install .``. \n\nBefore importing and calling the library, or launching the UI, remember to first activate the conda environemnt.\n\n## Quick Tutorial\n\nLet's suppose we have some observational tabular data, and we want to perform causal discovery using the PC algorithm. The following code illustrates this using data that is synthetically generated using the CausalAI library:\n\n```python\n# Causal Discovery using PC algorithm on Tabular Data\nfrom causalai.models.tabular.pc import PCSingle, PC\nfrom causalai.models.common.CI_tests.partial_correlation import PartialCorrelation\nfrom causalai.data.data_generator import DataGenerator # for generating data randomly\nfrom causalai.models.common.prior_knowledge import PriorKnowledge\nfrom causalai.data.tabular import TabularData # tabular data object\nfrom causalai.data.transforms.time_series import StandardizeTransform\n\n#### Generate a ground truth causal graph and data radom using it, for illustration purposes\nfn = lambda x:x # non-linearity\ncoef = 0.1\n# Structural equation model (SEM) defining the ground truth causal graph\nsem = {\n        'a': [], \n        'b': [('a', coef, fn), ('f', coef, fn)], # b = coef* fn(a) + coef* fn(f) + noise\n        'c': [('b', coef, fn), ('f', coef, fn)],\n        'd': [('b', coef, fn), ('g', coef, fn)],\n        'e': [('f', coef, fn)], \n        'f': [],\n        'g': [],\n        }\nT = 5000 # number of samples\ndata_array, var_names, graph_gt = DataGenerator(sem, T=T, seed=0, discrete=False)\n# data_array is a (T x 7) NumPy array\n# var_names = ['a', 'b', 'c', 'd', 'e', 'f', 'g']\n# graph_gt is a Python dictionary\n\n### standardize data and create a CausalAI Tabular data object\nStandardizeTransform_ = StandardizeTransform()\nStandardizeTransform_.fit(data_array)\ndata_trans = StandardizeTransform_.transform(data_array)\ndata_obj = TabularData(data_trans, var_names=var_names)\n\n### Run PC algorithm\n\n# provide optional (use None) prior knowledge saying b-\u003ea is forbidden.\nprior_knowledge = PriorKnowledge(forbidden_links={'a': ['b']}) \n\npvalue_thres = 0.01\nCI_test = PartialCorrelation() \npc = PC(\n        data=data_obj,\n        prior_knowledge=prior_knowledge,\n        CI_test=CI_test,\n        use_multiprocessing=False\n        )\nresult = pc.run(pvalue_thres=pvalue_thres, max_condition_set_size=2)\n\n# print estimated causal graph\ngraph_est={n:[] for n in result.keys()}\nfor key in result.keys():\n    parents = result[key]['parents']\n    graph_est[key].extend(parents)\n    print(f'{key}: {parents}')\n\n########### prints\n# a: []\n# b: ['d', 'a', 'c', 'f']\n# c: ['f', 'b']\n# d: ['g', 'b']\n# e: ['f']\n# f: ['e', 'b', 'c']\n# g: ['d']\n###########\n\n### Evaluate the estimated causal graph given we have ground truth in this case\nfrom causalai.misc.misc import plot_graph, get_precision_recall\n\nprecision, recall, f1_score = get_precision_recall(graph_est, graph_gt)\nprint(f'Precision {precision:.2f}, Recall: {recall:.2f}, F1 score: {f1_score:.2f}')\n# Precision 0.64, Recall: 1.00, F1 score: 0.67\n```\n\nNow let's suppose we have some observational tabular data and the causal graph (not the SEM) for this data, and we want to estimate causal inference effects (specifically ATE in this example) on a desired target variable given treatment variables. The following code illustrates this using data that is synthetically generated using the CausalAI library:\n\n```python\n# Causal Inference on Tabular Data using Backdoor method and in-house CausalPath method\nfrom causalai.data.data_generator import DataGenerator\nfrom causalai.models.tabular.causal_inference import CausalInference\nfrom sklearn.linear_model import LinearRegression\nimport numpy as np\n\ndef define_treatments(name, t,c):\n    treatment = dict(var_name=name,\n                    treatment_value=t,\n                    control_value=c)\n    return treatment\n\n#### Generate a ground truth causal graph and data radom using it, for illustration purposes\nfn = lambda x:x # non-linearity\ncoef = 0.5\n# Structural equation model (SEM) defining the ground truth causal graph\nsem = {\n        'a': [], \n        'b': [('a', coef, fn), ('f', coef, fn)], \n        'c': [('b', coef, fn), ('f', coef, fn)],\n        'd': [('b', coef, fn), ('g', coef, fn)],\n        'e': [('f', coef, fn)], \n        'f': [],\n        'g': [],\n        }\nT = 5000 # number of samples\ndata, var_names, graph_gt = DataGenerator(sem, T=T, seed=0, discrete=False)\n# data_array is a (T x 7) NumPy array\n# var_names = ['a', 'b', 'c', 'd', 'e', 'f', 'g']\n# graph_gt is a Python dictionary\n\n### Define treatment variables, and their treatment and control values\nt1='a' # treatment variable\nt2='b' # treatment variable\ntarget = 'c' # target variable; Notice c does not depend on a if we intervene on b.\ntarget_var = var_names.index(target)\n\n# treatment values\nintervention11 = 100*np.ones(T)\nintervention21 = 10*np.ones(T)\n\n# control values\nintervention12 = -0.*np.ones(T)\nintervention22 = -2.*np.ones(T)\n\ntreatments = [define_treatments(t1, intervention11,intervention12),\\\n              define_treatments(t2, intervention21,intervention22)]\n\n### Perform Causal Inference\nCausalInference_ = CausalInference(data, var_names, graph_gt, LinearRegression , discrete=False, method='causal_path')\n\nate, y_treat,y_control = CausalInference_.ate(target, treatments)\nprint(f'Estimated ATE using causal_path method: {ate:.2f}')\n# Estimated ATE using causal_path method: 6.38\n\nCausalInference_ = CausalInference(data, var_names, graph_gt, LinearRegression , discrete=False, method='backdoor')\n\nate, y_treat,y_control = CausalInference_.ate(target, treatments)\nprint(f'Estimated ATE using backdoor method: {ate:.2f}')\n# Estimated ATE using backdoor method: 7.21\n\n\n\n### Compute the True ATE given we can generate data with intervention in this case since it is synthetically generated\nintervention_data1,_,_ = DataGenerator(sem, T=T, seed=0,\n                        intervention={t1:intervention11, t2:intervention21})\n\nintervention_data2,_,_ = DataGenerator(sem, T=T, seed=0,\n                        intervention={t1:intervention12, t2:intervention22})\n\n\n\ntrue_effect = (intervention_data1[:,target_var] - intervention_data2[:,target_var]).mean()\nprint(\"True ATE = %.2f\" %true_effect)\n# True ATE = 6.00\n```\nFor more tutorials on other supported functionalities of CausalAI, please see\n[`tutorials`](https://github.com/salesforce/causalai/tree/main/tutorials).\n\n## User Inferface\n\nWe provide a UI for users to directly upload their data and run causal discovery and causal inference algorithms without the need to write any code. An introduction to the UI can be found [here](https://opensource.salesforce.com/causalai/latest/ui_tutorial.html).\n\n\nIn order to launch the UI, go to the root directory of the library and execute ``./launch_ui.sh``, and open the url specified in the terminal in a browser. In order to terminate the UI, press Ctrl+c in the terminal where the UI was launched, and then execute ``./exit_ui.sh``.\n\n## Documentation\n\nFor Jupyter notebooks with exmaples, see\n[`tutorials`](https://github.com/salesforce/causalai/tree/main/tutorials). Detailed API documentation with tutorials can be found [here](https://opensource.salesforce.com/causalai). The\n[technical report](https://arxiv.org/abs/2301.10859) describes the implementation details of the algorithms along with their assumptions and also covers important aspects of the API. Further, it also presents experimental results that demosntrate the speed and performance our library compared with some of the existing libraries.\n\n## Technical Report and Citing Salesforce CausalAI\nYou can find more details in our [technical report](https://arxiv.org/abs/2301.10859)\n\nIf you're using Salesforce CausalAI in your research or applications, please cite using this BibTeX:\n```\n@article{salesforce_causalai23,\n      title={Salesforce CausalAI Library: A Fast and Scalable framework for Causal Analysis of Time Series and Tabular Data},\n      author={Arpit, Devansh and Fernandez, Matthew, and Feigenbaum, Itai and Yao, Weiran and Liu, Chenghao and Yang, Wenzhuo and Josel, Paul and Heinecke, Shelby and Hu, Eric and Wang, Huan and Hoi, Stephen and Xiong, Caiming and Zhang, Kun and Niebles, Juan Carlos},\n      year={2023},\n      eprint={arXiv preprint arXiv:2301.10859},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsalesforce%2Fcausalai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsalesforce%2Fcausalai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsalesforce%2Fcausalai/lists"}