{"id":18818904,"url":"https://github.com/py-why/pywhy-stats","last_synced_at":"2025-04-13T23:32:49.504Z","repository":{"id":172052224,"uuid":"621903408","full_name":"py-why/pywhy-stats","owner":"py-why","description":"Python package for (conditional) independence testing and statistical functions related to causality.","archived":false,"fork":false,"pushed_at":"2025-01-01T06:06:11.000Z","size":5001,"stargazers_count":28,"open_issues_count":9,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-12T08:21:17.678Z","etag":null,"topics":["conditional-independence-test","independence-testing","python","statistics"],"latest_commit_sha":null,"homepage":"https://www.pywhy.org/pywhy-stats/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/py-why.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-03-31T16:30:41.000Z","updated_at":"2025-03-31T05:23:31.000Z","dependencies_parsed_at":"2023-12-13T18:35:01.573Z","dependency_job_id":"58ad7085-6e13-4eae-a484-6e1b9335d4f3","html_url":"https://github.com/py-why/pywhy-stats","commit_stats":null,"previous_names":["py-why/pywhy-stats"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/py-why%2Fpywhy-stats","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/py-why%2Fpywhy-stats/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/py-why%2Fpywhy-stats/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/py-why%2Fpywhy-stats/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/py-why","download_url":"https://codeload.github.com/py-why/pywhy-stats/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248796939,"owners_count":21163055,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conditional-independence-test","independence-testing","python","statistics"],"created_at":"2024-11-08T00:19:23.471Z","updated_at":"2025-04-13T23:32:48.817Z","avatar_url":"https://github.com/py-why.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![CircleCI](https://circleci.com/gh/py-why/pywhy-stats/tree/main.svg?style=svg)](https://circleci.com/gh/py-why/pywhy-stats/tree/main)\n[![unit-tests](https://github.com/py-why/pywhy-stats/actions/workflows/main.yml/badge.svg)](https://github.com/py-why/pywhy-stats/actions/workflows/main.yml)\n[![Checked with mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)\n[![codecov](https://codecov.io/gh/py-why/pywhy-stats/branch/main/graph/badge.svg?token=H1reh7Qwf4)](https://codecov.io/gh/py-why/pywhy-stats)\n[![PyPI Download count](https://img.shields.io/pypi/dm/pywhy-stats.svg)](https://pypistats.org/packages/pywhy-stats)\n[![Latest PyPI release](https://img.shields.io/pypi/v/pywhy-stats.svg)](https://pypi.org/project/pywhy-stats/)\n\n# PyWhy-Stats\n\nPywhy-stats serves as Python library for implementations of various statistical methods, such as (un)conditional independence tests, which can be utilized in tasks like causal discovery. In the current version, PyWhy-stats supports:\n- Kernel-based independence and conditional k-sample tests\n- FisherZ-based independence tests\n- Power-divergence independence tests\n- Bregman-divergence conditional k-sample tests\n\n# Documentation\n\nSee the [development version documentation](https://py-why.github.io/pywhy-stats/dev/index.html).\n\nOr see [stable version documentation](https://py-why.github.io/pywhy-stats/stable/index.html)\n\n# Installation\n\nInstallation is best done via `pip` or `conda`. For developers, they can also install from source using `pip`. See [installation page](https://www.pywhy.org/pywhy-stats/dev/installation.html) for full details.\n\n## Dependencies\n\nMinimally, pywhy-stats requires:\n\n    * Python (\u003e=3.8)\n    * numpy\n    * scipy\n    * scikit-learn\n\n## User Installation\n\nIf you already have a working installation of numpy and scipy, the easiest way to install pywhy-stats is using `pip`:\n\n    pip install -U pywhy-stats\n\nTo install the package from github, clone the repository and then `cd` into the directory. You can then use `poetry` to install:\n\n    poetry install\n\n    # if you would like an editable install of pywhy-stats for dev purposes\n    pip install -e .\n\n# Quick Start\n\nIn the following sections, we will use artificial exemplary data to demonstrate the API's functionality. More\ninformation about the methods and hyperparameters can be found in the [documentation](https://py-why.github.io/pywhy-stats/stable/index.html).\n\nNote that most methods in PyWhy-Stats support multivariate inputs. For this. simply pass in a\n2D numpy array where rows represent samples and columns the different dimensions.\n\n### Unconditional Independence Tests\n\nConsider the following exemplary data:\n\n```Python\nimport numpy as np\n  \nrng = np.random.default_rng(0)\nX = rng.standard_normal((200, 1))\nY = np.exp(X + rng.standard_normal(size=(200, 1)))\n```\n\nHere, $Y$ depends on $X$ in a non-linear way. We can use the simplified API of PyWhy-Stats to test the null hypothesis\nthat the variables are independent:\n\n```Python\nfrom pywhy_stats import independence_test\n \nresult = independence_test(X, Y)\nprint(\"p-value:\", result.pvalue, \"Test statistic:\", result.statistic)\n```\n\nThe `independence_test` method returns an object containing a p-value, a test statistic, and possibly additional\ninformation about the test. By default, this method employs a heuristic to select the most appropriate test for the\ndata. Currently, it defaults to a kernel-based independence test.\n\nAs we observed, the p-value is significantly small. Using, for example, a significance level of 0.05, we would reject\nthe null hypothesis of independence and infer that these variables are dependent. However, a p-value exceeding the\nsignificance level doesn't conclusively indicate that the variables are independent, it only indicates insufficient\nevidence of dependence.\n\nWe can also be more specific in the type of independence test we want to use. For instance, to use\na FisherZ test, we can indicate this by:\n\n```Python\nfrom pywhy_stats import Methods\n\nresult = independence_test(X, Y, method=Methods.FISHERZ)\nprint(\"p-value:\", result.pvalue, \"Test statistic:\", result.statistic)\n```\n\nOr for the kernel based independence test:\n\n```Python\nfrom pywhy_stats import Methods\n\nresult = independence_test(X, Y, method=Methods.KCI)\nprint(\"p-value:\", result.pvalue, \"Test statistic:\", result.statistic)\n```\n\nFor more information about the available methods, hyperparameters and other details, see the\n[documentation](https://py-why.github.io/pywhy-stats/stable/index.html).\n\n### Conditional independence test\n\nSimilar to the unconditional independence test, we can use the same API to condition on another variable or set of\nvariables. First, let's generate a third variable $Z$ to condition on:\n\n```\nimport numpy as np\n  \nrng = np.random.default_rng(0)\nZ = rng.standard_normal((200, 1))\nX = Z + rng.standard_normal(size=(200, 1))\nY = np.exp(Z + rng.standard_normal(size=(200, 1)))\n```\n\nHere, $X$ and $Y$ are dependent due to $Z$. Running an unconditional independence test yields:\n\n```Python\nfrom pywhy_stats import independence_test\n \nresult = independence_test(X, Y)\nprint(\"p-value:\", result.pvalue, \"Test statistic:\", result.statistic)\n```\n\nAgain, the p-value is very small, indicating a high likelihood that $X$ and $Y$ are dependent. Now,\nlet's condition on $Z$, which should render the variables as independent:\n\n```Python\nresult = independence_test(X, Y, condition_on=Z)\nprint(\"p-value:\", result.pvalue, \"Test statistic:\", result.statistic)\n```\n\nWe observe that the p-value isn't small anymore. Indeed, if the variables were independent, we would expect the p-value\nto be uniformly distributed on $[0, 1]$.\n\n### (Conditional) k-sample test\n\nIn certain settings, you may be interested in testing the invariance between k (conditional) distributions. For example, say you have data collected over the same set of variables (X, Y) from humans ($P^1(X, Y)$) and bonobos ($P^2(X, Y)$). You can determine if the conditional distributions $P^1(Y | X) = P^2(Y | X)$ using conditional two-sample test.\n\nFirst, we can create some simulated data that arise from two distinct distributions. However, the data generating Y is invariant across these two settings once we condition on X.\n\n```Python\nimport numpy as np\n  \nrng = np.random.default_rng(0)\nX1 = rng.standard_normal((200, 1))\nX2 = rng.uniform(low=0.0, high=1.0, size=(200, 1))\n\nY1 = np.exp(X1 + rng.standard_normal(size=(200, 1)))\nY2 = np.exp(X2 + rng.standard_normal(size=(200, 1)))\n\ngroups = np.concatenate((np.zeros((200, 1)), np.ones((200, 1))))\nX = np.concatenate((X1, X2))\nY = np.concatenate((Y1, Y2))\n```\n\nWe test the hypothesis that $P^1(Y | X) = P^2(Y | X)$ now with the following code.\n\n```Python\nfrom pywhy_stats import conditional_ksample\n\n# test that P^1(Y | X) = P^2(Y | X)\nresult = conditional_ksample.kcd.condind(X, Y, groups)\n\nprint(\"p-value:\", result.pvalue, \"Test statistic:\", result.statistic)\n```\n\n# Contributing\n\nWe welcome contributions from the community. Please refer to our [contributing document](./CONTRIBUTING.md) and [developer document](./DEVELOPING.md) for information on developer workflows.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpy-why%2Fpywhy-stats","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpy-why%2Fpywhy-stats","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpy-why%2Fpywhy-stats/lists"}