{"id":13459551,"url":"https://github.com/Minyus/causallift","last_synced_at":"2025-03-24T18:30:48.112Z","repository":{"id":43496192,"uuid":"179265399","full_name":"Minyus/causallift","owner":"Minyus","description":"CausalLift: Python package for causality-based Uplift Modeling in real-world business","archived":false,"fork":false,"pushed_at":"2023-05-13T08:24:39.000Z","size":6080,"stargazers_count":334,"open_issues_count":3,"forks_count":42,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-08-31T10:18:58.833Z","etag":null,"topics":["causal-impact","causal-inference","causality","counterfactual","econometrics","propensity-score","propensity-scores","uplift","uplift-modeling"],"latest_commit_sha":null,"homepage":"https://causallift.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Minyus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-03T10:18:45.000Z","updated_at":"2024-08-16T05:16:53.000Z","dependencies_parsed_at":"2024-06-19T09:58:07.275Z","dependency_job_id":"6a8a1e59-a335-4025-94a8-1f023d66554a","html_url":"https://github.com/Minyus/causallift","commit_stats":{"total_commits":263,"total_committers":5,"mean_commits":52.6,"dds":0.2585551330798479,"last_synced_commit":"b2f04a0920f700340aafc4726d154beb0ae3f98b"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Minyus%2Fcausallift","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Minyus%2Fcausallift/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Minyus%2Fcausallift/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Minyus%2Fcausallift/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Minyus","download_url":"https://codeload.github.com/Minyus/causallift/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221995704,"owners_count":16913546,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["causal-impact","causal-inference","causality","counterfactual","econometrics","propensity-score","propensity-scores","uplift","uplift-modeling"],"created_at":"2024-07-31T10:00:19.419Z","updated_at":"2024-10-29T05:31:05.909Z","avatar_url":"https://github.com/Minyus.png","language":"Python","funding_links":[],"categories":["Example projects","Python"],"sub_categories":[],"readme":"# CausalLift: Python package for Uplift Modeling in real-world business; applicable for both A/B testing and observational data\n\n[![PyPI version](https://badge.fury.io/py/causallift.svg)](\nhttps://badge.fury.io/py/causallift\n)\n![Python Version](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue.svg)\n[![License: BSD-2-Clause](https://img.shields.io/badge/License-BSD-yellow.svg)](\nhttps://opensource.org/licenses/BSD-2-Clause\n)\n[![Documentation](https://readthedocs.org/projects/causallift/badge/?version=latest)](https://causallift.readthedocs.io/)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](\nhttps://colab.research.google.com/github/Minyus/causallift/blob/master/notebooks/demo/CausalLift_demo.ipynb\n)\n\n## Introduction\n\n### Scenario 1: Marketing campaign/promotion targeting\n\nSuppose you are responsible for a marketing campaign/promotion (show an advertisement, offer discount, make a phone call, etc.) to some customers to increase revenue or prevent churns. Which one will you choose?\n\n- Strategy A: choose customers who will buy the product (with or without being contacted)\n- Strategy B: choose customers who will buy the product if contacted, but will not if not contacted\n\nStrategy B is known as Uplift Modelling.\n\n\n### Scenario 2: Recommendation systems in E-commerce sites\n\nSuppose you are responsible for recommendation system at a E-commerce company. Which one will you choose?\n\n- Strategy A: recommend a product the user will buy (with or without recommendation)\n- Strategy B: recommend a product the user will buy if recommended, but will not if not recommended \n\nStrategy B is known as Uplift Modelling.\n\n\n### Scenario 3: US presidential campaign\n\nSuppose you are trying to make a candidate to be the next US president. Which one will you choose?\n\n- Strategy A: contact voters who will vote for the candidate (with or without being contacted)\n- Strategy B: contact voters who will vote for the candidate if contacted, but will not if not contacted\n\nStrategy B is known as Uplift Modelling, and used by Barack Obama in 2012. Here are some articles.\n\n- [What is 'Persuasion Modeling', and how did it help Obama to win the elections?](http://numrush.com/2013/06/28/what-is-persuasion-modeling-and-how-did-it-help-obama-to-win-the-elections/)\n- [How Obama's Team Used Big Data to Rally Voters](https://www.technologyreview.com/s/509026/how-obamas-team-used-big-data-to-rally-voters/)\n- [How uplift modeling helped Obama's campaign -- and can aid marketers](https://searchbusinessanalytics.techtarget.com/video/How-uplift-modeling-helped-Obamas-campaign-and-can-aid-marketers)\n\n\n### Scenario 4: Avoid death\n\nSuppose you can receive one of the following words of the God of Machine Learning. Which one will you choose?\n\n- Option A: prediction that you will die somewhere next year\n- Option B: prediction that you will die if you live in city XXX next year, but will not die if you move to city YYY.\n\nOption B is the analogy of Uplift Modeling.\n\n\n## What is Uplift Modeling?\n\n\nUplift Modeling is a Machine Learning technique to find which customers (individuals) should be\ntargeted (\"treated\") and which customers should not be targeted.\n\nUplift Modeling is also known as persuasion modeling, incremental modeling, treatment effects\nmodeling, true lift modeling, or net modeling.\n\nUplift Modeling predicts the following 4 labels:\n\n- True Uplift, aka \"Persuadables\"\n  - Customers will buy a product if treated, but will not buy if not treated\n- False Uplift, aka \"Sure Things\"\n  - Customers will buy a product regardless of the treatment\n- True Drop, aka \"Sleeping Dogs\"/\"Do Not Disturbs\"\n  - Customers who will *not* buy a product if treated, but will buy if *not* treated\n- False Drop, aka \"Lost Causes\"\n  - Customers will not buy a product regardless of the treatment\n\n\n## How does Uplift Modeling work?\n\nUplift Modeling estimates uplift scores (a.k.a. CATE: Conditional Average Treatment Effect or ITE:\nIndividual Treatment Effect). Uplift score is how much the estimated conversion rate will increase\nby the campaign.\n\nSuppose you are in charge of a marketing campaign to sell a product, and the estimated conversion\nrate (probability to buy a product) of a customer is 50 % if targeted and the estimated conversion\nrate is 40 % if not targeted, then the uplift score of the customer is (50-40) = +10 % points.\nLikewise, suppose the estimated conversion rate if targeted is 20 % and the estimated conversion\nrate if not targeted is 80%, the uplift score is (20-80) = -60 % points (negative value).\n\nThe range of uplift scores is between -100 and +100 % points (-1 and +1).\nIt is recommended to target customers with high uplift scores and avoid customers with negative\nuplift scores to optimize the marketing campaign.\n\n\n## What are the advantages of \"CausalLift\" package?\n\n- CausalLift works with both A/B testing results and observational datasets.\n- CausalLift can output intuitive metrics for evaluation.\n\n## Why CausalLift was developed?\n\nIn a word, to use for real-world business.\n\n- Existing packages for Uplift Modeling assumes the dataset is from A/B Testing (a.k.a. Randomized\nControlled Trial). In real-world business, however, observational datasets in which treatment\n(campaign) targets were not chosen randomly are more common especially in the early stage of\nevidence-based decision making. CausalLift supports observational datasets using a basic\nmethodology in Causal Inference called \"Inverse Probability Weighting\" based on the assumption that\npropensity to be treated can be inferred from the available features.\n\n- There are 2 challenges of Uplift Modeling; explainability of the model and evaluation. CausalLift\nutilizes a basic methodology of Uplift Modeling called Two Models approach (training 2 models\nindependently for treated and untreated samples to compute the CATE (Conditional Average Treatment\nEffects) or uplift scores) to address these challenges.\n\n\t- [Explainability of the model] Since it is relatively simple, it is less challenging to\n\texplain how it works to stakeholders in the business.\n\n\t- [Explainability of evaluation] To evaluate Uplift Modeling, metrics such as Qini and AUUC\n\t(Area Under the Uplift Curve) are used in research, but these metrics are difficult to explain\n\tto the stakeholders. For business, a metric that can estimate how much more profit can be\n\tearned is more practical. Since CausalLift adopted the Two-Model approach, the 2 models can be\n\treused to simulate the outcome of following the recommendation by the Uplift Model and can\n\testimate how much conversion rate (the proportion of people who took the desired action such as\n\tbuying a product) will increase using the uplift model.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/Minyus/causallift/master/readme_images/CausalLift_flow_diagram.png\" width=\"415\" height=\"274\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\tCausalLift flow diagram\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/Minyus/causallift/master/readme_images/CausalLift_Viz.PNG\" width=\"734\" height=\"465\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\tCausalLift internal pipeline (visualized by Kedro Viz)\n\u003c/p\u003e\n\n\n## Supported Python versions\n\n- Python 3.5\n- Python 3.6 (Tested and recommended)\n- Python 3.7\n\n## Installation\n\n### Install dependencies\n\n```bash\n$ pip install python-json-logger\u003c=2.0.4 kedro\u003c=0.17.7 scikit-learn\u003c=0.21.3 numpy pandas easydict\n```\n\nNote:\n- Python 3.8 or later is not supported yet.\n- scikit-learn 0.22 or later is not supported yet.\n- kedro 0.18 or later is not supported yet.\n \n### Install CausalLift \n\n- [Option 1] To install the latest release from the PyPI:\n\n```bash\n$ pip install causallift\n```\n\n- [Option 2] To install the latest pre-release:\n\n```bash\n$ pip install git+https://github.com/Minyus/causallift.git\n```\n\n- [Option 3] To install the latest pre-release without need to reinstall even after modifying the source code:\n\n```bash\n$ git clone https://github.com/Minyus/causallift.git\n$ cd pipelinex\n$ python setup.py develop\n```\n\n\n### Optional:\n\n- matplotlib\n- xgboost\n- scikit-optimize\n\n### Optional for visualization of the pipeline:\n\n- kedro-viz\n\n## How is the data pipeline implemented by CausalLift?\n\n### Step 0: Prepare data\n\nPrepare the following columns in 2 pandas DataFrames, train and test (validation).\n\n- Features\n\t- a.k.a independent variables, explanatory variables, covariates\n\t- e.g. customer gender, age range, etc.\n\t- Note: Categorical variables need to be one-hot coded so propensity can be estimated using\n\tlogistic regression. [pandas.get_dummies](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html) can be used.\n- Outcome: binary (0 or 1)\n\t- a.k.a dependent variable, target variable, label\n\t- e.g. whether the customer bought a product, clicked a link, etc.\n- Treatment: binary (0 or 1)\n\t- a variable you can control and want to optimize for each individual (customer)\n\t- a.k.a intervention\n\t- e.g. whether an advertising campaign was executed, whether a discount was offered, etc.\n\t- Note: if you cannot find a treatment column, you may need to ask stakeholders to get the data, which might take hours to years.\n- [Optional] Propensity: continuous between 0 and 1\n\t- propensity (or probability) to be treated for observational datasets (not needed for A/B Testing results)\n\t- If not provided, CausalLift can estimate from the features using logistic regression.\n\n\u003cimg src=\"https://raw.githubusercontent.com/Minyus/causallift/master/readme_images/Example_table_data.png\"\u003e\n\u003cp align=\"center\"\u003e\n\tExample table data\n\u003c/p\u003e\n\n### Step 1: Prepare for Uplift modeling and optionally estimate propensity scores using a supervised classification model\n\nIf the `train_df` is from observational data (not A/B Test), you can set `enable_ipw`=True so IPW (Inverse Probability Weighting) can address the issue that treatment should have been chosen based on a different probability (propensity score) for each individual (e.g. customer, patient, etc.)\n\nIf the `train_df` is from A/B Test or RCT (Randomized Controlled Trial), set `enble_ipw`=False to skip estimating propensity score.\n\n### Step 2: Estimate CATE by 2 supervised classification models\n\nTrain 2 supervised classification models (e.g. XGBoost) for treated and untreated samples independently and compute estimated CATE (Conditional Average Treatment Effect), ITE (Individual Treatment Effect), or uplift score.\n\nThis step is the Uplift Modeling consisting of 2 sub-steps:\n\n1. Training using train_df (Note: `Treatment` and `Outcome` are used)\n\n2. Prediction of CATE for train_df and test_df (Note: Neither `Treatment` nor `Outcome` is used.)\n\n### Step 3 [Optional] Estimate impact by following recommendation based on CATE\n\nEstimate how much conversion rate will increase by selecting treatment (campaign) targets as recommended by the uplift modeling.\n\nYou can optionally evaluate the predicted CATE for train_df and test_df (Note: `CATE`, `Treatment` and `Outcome` are used.)\n\nThis step is _optional_; you can skip if you want only CATE and you do not find this evaluation step useful.\n\n\n## How to use CausalLift?\n\nThere are 2 ways:\n  - [Deprecated option] Use `causallift.CausalLift` class interface\n  - [Recommended option] Use `causallift.nodes` subpackage with [`PipelineX`](https://github.com/Minyus/pipelinex) package\n\n### [Deprecated option] Use `causallift.CausalLift` class interface\n\nPlease see the demo code in Google Colab (free cloud CPU/GPU environment):\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](\nhttps://colab.research.google.com/github/Minyus/causallift/blob/master/notebooks/demo/CausalLift_demo.ipynb\n)\n\nTo run the code, navigate to \"Runtime\" \u003e\u003e \"Run all\".\n\nTo download the notebook file, navigate to \"File\" \u003e\u003e \"Download .ipynb\".\n\nHere are the basic steps to use.\n\n```python\nfrom causallift import CausalLift\n\n\"\"\" Step 1. \"\"\"\ncl = CausalLift(train_df, test_df, enable_ipw=True)\n\n\"\"\" Step 2. \"\"\"\ntrain_df, test_df = cl.estimate_cate_by_2_models()\n\n\"\"\" Step 3. \"\"\"\nestimated_effect_df = cl.estimate_recommendation_impact()\n```\n\n\n### [Recommended option] Use `causallift.nodes` subpackage with [`PipelineX`](https://github.com/Minyus/pipelinex) package\n\nPlease see [PipelineX](https://github.com/Minyus/pipelinex) package and\n use [PipelineX Causallift example project](https://github.com/Minyus/pipelinex_causallift).\n\n\n## How to run inference (prediction of CATE for new data with `Treatment` and `Outcome` unknown)?\n\nUse the whole historical data (A/B Test data or observational data) as train_df instead of splitting into `tran_df` and `test_df`, and use the new data with `Treatment` and `Outcome` unknown as `test_df`.\n\nThis is possible because `Treatment` and `Outcome` are not used for prediction of CATE after Uplift Model is trained using `Treatment` and `Outcome`.\n\nPlease note that valid evaluation for `test_df` will not be available as valid `Treatment` and `Outcome` are not available.\n\n\n## Details about the parameters\n\nPlease see [[CausalLift API document]](https://causallift.readthedocs.io/en/latest/).\n\n\n## Related Python packages\n\n- [\"pylift\"](https://github.com/wayfair/pylift)\n[[documentation]](https://pylift.readthedocs.io/en/latest/)\n\n\tUplift Modeling based on Transformed Outcome method for A/B Testing data and visualization of\n\tmetrics such as Qini.\n\n- [\"EconML\" (ALICE: Automated Learning and Intelligence for Causation and Economics)](https://github.com/Microsoft/EconML)\n[[documentation]](https://econml.azurewebsites.net/index.html)\n\n\tSeveral advanced methods to estimate CATE from observational data.\n\n- [\"DoWhy\"](https://github.com/Microsoft/dowhy)\n[[documentation]](https://causalinference.gitlab.io/dowhy/)\n\n\tVisualization of steps in Causal Inference for observational data.\n\n- [\"pymatch\"](https://github.com/benmiroglio/pymatch)\n\n\tPropensity Score Matching for observational data.\n\n- [\"Ax\"](https://github.com/facebook/Ax)\n[[documentation]](https://ax.dev/)\n\n\tPlatform for adaptive experiments, powered by BoTorch, a library built on PyTorch\n\n## Related R packages\n\n- [\"uplift\"](https://cran.r-project.org/web/packages/uplift/index.html)\n\n\tUplift Modeling.\n\n- [\"tools4uplift\"](https://cran.r-project.org/web/packages/tools4uplift/index.html)\n[[paper]](https://arxiv.org/abs/1901.10867)\n\n\tUplift Modeling and utility tools for quantization of continuous variables, visualization of\n\tmetrics such as Qini, and automatic feature selection.\n\n- [\"matching\"](https://cran.r-project.org/web/packages/Matching/index.html)\n\n\tPropensity Score Matching for observational data.\n\n- [\"CausalImpact\"](https://cran.r-project.org/web/packages/CausalImpact/index.html)\n[[documentation]](https://google.github.io/CausalImpact/CausalImpact.html)\n\n\tCausal inference using Bayesian structural time-series models\n\n\n## References\n\n- Gutierrez, Pierre. and G´erardy, Jean-Yves. Causal inference and uplift modelling: A review of\nthe literature. In International Conference on Predictive Applications and APIs, pages 1-13, 2017.\n\n- Athey, Susan and Imbens, Guido W. Machine learning methods for estimating heterogeneous causal\neffects. Stat, 2015.\n\n- Yi, Robert. and Frost, Will. (n.d.). Pylift: A Fast Python Package for Uplift Modeling. Retrieved\nApril 3, 2019, from https://tech.wayfair.com/2018/10/pylift-a-fast-python-package-for-uplift-modeling/\n\n\n## Introductory resources about Uplift Modeling\n\n- \u003c[Medium article: Uplift Models for better marketing campaigns (Part 1)](\nhttps://medium.com/@abhayspawar/uplift-models-for-better-marketing-campaigns-part-1-b491292e4c80\n)\u003e\n- \u003c[Medium article: Simple Machine Learning Techniques To Improve Your Marketing Strategy: Demystifying Uplift Models](\nhttps://medium.com/datadriveninvestor/simple-machine-learning-techniques-to-improve-your-marketing-strategy-demystifying-uplift-models-dc4fb3f927a2\n)\u003e\n- \u003c[Wikipedia: Uplift_modelling](\nhttps://en.wikipedia.org/wiki/Uplift_modelling\n)\u003e\n\n## License\n\n[BSD 2-clause License](https://github.com/Minyus/causallift/blob/master/LICENSE).\n\n\n## To-dos\n\n- Support Python\u003e=3.8, kedro\u003e=0.18, and scikit-learn\u003e=0.23\n- Improve documentation\n- Clarify the model summary output including visualization\n- Add examples of applying uplift modeling to more publicly available datasets\n(such as [Lending Club Loan Data](https://www.kaggle.com/wendykan/lending-club-loan-data)\nas [pymatch](https://github.com/benmiroglio/pymatch) did.\n- Support for multiple treatments\n\n\n## Contributing\n\nAny feedback is welcome!\n\nPlease create an issue for questions, suggestions, and feature requests.\nPlease open pull requests to improve documentation, usability, and features against `develop` branch.\n\nSeparate pull requests for each improvement are appreciated rather than a big pull request.\nIt is encouraged to use:\n- [Google-style docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)\n- [PEP 484 comment-style type annotation](https://mypy.readthedocs.io/en/latest/cheat_sheet.html)\nalthough Python 2 is not supported.\n- An intelligent IDE such as PyCharm or VS Code\n\nIf you could write a review about CausalLift in any natural languages (English, Chinese, Japanese,\netc.) or implement similar features in any programming languages (R, SAS, etc.), please let me\nknow. I will add the link here.\n\n## Keywords to search\n\n[English] Causal Inference, Counterfactual, Propensity Score, Econometrics\n\n[中文] 因果推断, 反事实, 倾向评分, 计量经济学\n\n[日本語] 因果推論, 反事実, 傾向スコア, 計量経済学\n\n## Article about CausalList in Japanese\n\n- https://qiita.com/Minyus86/items/07ce57a8bddc49c2bbf5\n\n## Author:\n\nYusuke Minami\n\n- [@Minyus](https://github.com/Minyus)\n- https://www.linkedin.com/in/yusukeminami/\n- https://twitter.com/Minyus86\n\n## Contributors:\n\n[@farismosman](https://github.com/farismosman)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMinyus%2Fcausallift","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMinyus%2Fcausallift","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMinyus%2Fcausallift/lists"}