{"id":20374769,"url":"https://github.com/kmedian/korr","last_synced_at":"2025-04-12T07:12:55.410Z","repository":{"id":52498477,"uuid":"158409574","full_name":"kmedian/korr","owner":"kmedian","description":"collection of utility functions for correlation analysis","archived":false,"fork":false,"pushed_at":"2022-06-19T21:09:47.000Z","size":1859,"stargazers_count":7,"open_issues_count":1,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-12T07:12:48.352Z","etag":null,"topics":["binary-correlation","confusion-matrix","correlation","correlation-analysis","correlation-matrix","correlation-pairs","eda","kendall","kendall-tau","matthews","p-value","pearson","pearson-correlation","pypi","python","rank-correlation","sample-correlation","spearman"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kmedian.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":["ulf1"]}},"created_at":"2018-11-20T15:17:30.000Z","updated_at":"2024-07-01T01:17:38.000Z","dependencies_parsed_at":"2022-08-29T10:52:47.425Z","dependency_job_id":null,"html_url":"https://github.com/kmedian/korr","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmedian%2Fkorr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmedian%2Fkorr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmedian%2Fkorr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmedian%2Fkorr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kmedian","download_url":"https://codeload.github.com/kmedian/korr/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248530571,"owners_count":21119600,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary-correlation","confusion-matrix","correlation","correlation-analysis","correlation-matrix","correlation-pairs","eda","kendall","kendall-tau","matthews","p-value","pearson","pearson-correlation","pypi","python","rank-correlation","sample-correlation","spearman"],"created_at":"2024-11-15T01:26:49.059Z","updated_at":"2025-04-12T07:12:55.381Z","avatar_url":"https://github.com/kmedian.png","language":"Python","funding_links":["https://github.com/sponsors/ulf1"],"categories":[],"sub_categories":[],"readme":"[![PyPI version](https://badge.fury.io/py/korr.svg)](https://badge.fury.io/py/korr)\n[![Total alerts](https://img.shields.io/lgtm/alerts/g/kmedian/korr.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/kmedian/korr/alerts/)\n[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/kmedian/korr.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/kmedian/korr/context:python)\n\n# korr\ncollection of utility functions for correlation analysis\n\n\n## Usage\nCheck the [examples](https://github.com/kmedian/korr/tree/master/examples) folder for notebooks.\n\nCompute correlation matrix and its p-values\n\n* [pearson](https://github.com/kmedian/korr/blob/master/examples/pearson.ipynb) -- Pearson/Sample correlation (interval- and ratio-scale data)\n* [kendall](https://github.com/kmedian/korr/blob/master/examples/kendall.ipynb) -- Kendall's tau rank correlation (ordinal data)\n* [spearman](https://github.com/kmedian/korr/blob/master/examples/spearman.ipynb) -- Spearman rho rank correlation (ordinal data)\n* [mcc](https://github.com/kmedian/korr/blob/master/examples/mcc%20(Matthews%20correlation).ipynb) -- Matthews correlation coefficient between binary variables \n\nEDA, Dig deeper into results\n\n* [flatten](https://github.com/kmedian/korr/blob/master/examples/flatten.ipynb) -- A table (pandas) with one row for each correlation pairs with the variable indicies, corr., p-value. For example, try to find \"good\" cutoffs with `corr_vs_pval` and then look up the variable indicies with `flatten` afterwards.\n* [slice_yx](https://github.com/kmedian/korr/blob/master/examples/slice_yx.ipynb) -- slice a correlation and p-value matrix of a (y,X) dataset into a (y,x_i) vector and (x_j, x_k) matrices\n* [corr_vs_pval](https://github.com/kmedian/korr/blob/master/examples/corr_vs_pval.ipynb)  -- Histogram to find p-value cutoffs (alpha) for a) highly correlated pairs, b) unrelated pairs, c) the mixed results. \n* [bracket_pval](hhttps://github.com/kmedian/korr/blob/master/examples/bracket_pval.ipynb) -- Histogram with more fine-grained p-value brackets. \n* [corrgram](https://github.com/kmedian/korr/blob/master/examples/corrgram.ipynb) -- Correlogram, heatmap of correlations with p-values in brackets\n\nUtility functions\n\n* [confusion](https://github.com/kmedian/korr/blob/master/examples/confusion.ipynb) -- Confusion matrix. Required for Matthews correlation (mcc) and is a bitter faster than sklearn's \n\nParameter Stability\n\n* [bootcorr](https://github.com/kmedian/korr/blob/master/examples/bootcorr.ipynb) -- Estimate multiple correlation matrices based on bootstrapped samples. From there you can assess how stable correlation estimates are (how sensitive against in-sample variation). For example, stable estimates are good candidates for modeling, and unstable correlation pairs are good candidates for P-hacking and non-reproducibility.\n\nVariable Selection, Search Functions\n\n* [mincorr](https://github.com/kmedian/korr/blob/master/examples/mincorr.ipynb) -- From all estimated correlation pairs, pick a given `n=3,5,..` of variables with low and insignificant correlations among each other. (See [binsel](https://github.com/kmedian/binsel) package for an application.)\n* `find_best` -- Find the N \"best\", i.e. high and most significant, correlations\n* `find_worst` -- Find the N \"worst\", i.e. insignificant/random and low, correlations\n* [find_unrelated](https://github.com/kmedian/korr/blob/master/examples/find_unrelated.ipynb) -- Return variable indicies of unrelated pairs (in terms of insignificant p-value)\n\n\n## Appendix\n\n### Installation\nThe `korr` [git repo](http://github.com/kmedian/korr) is available as [PyPi package](https://pypi.org/project/korr)\n\n```\npip install korr\n```\n\n### Install a virtual environment\n\n```\npython3.7 -m venv .venv\nsource .venv/bin/activate\npip install --upgrade pip\npip install -r requirements.txt --no-cache-dir\npip install -r requirements-dev.txt --no-cache-dir\npip install -r requirements-demo.txt --no-cache-dir\n```\n\n(If your git repo is stored in a folder with whitespaces, then don't use the subfolder `.venv`. Use an absolute path without whitespaces.)\n\n\n### Commands\n* Check syntax: `flake8 --ignore=F401`\n* Run Unit Tests: `pytest`\n* Remove `.pyc` files: `find . -type f -name \"*.pyc\" | xargs rm`\n* Remove `__pycache__` folders: `find . -type d -name \"__pycache__\" | xargs rm -rf`\n\nPublish\n\n```sh\npandoc README.md --from markdown --to rst -s -o README.rst\npython setup.py sdist \ntwine upload -r pypi dist/*\n```\n\n### Support\nPlease [open an issue](https://github.com/kmedian/korr/issues/new) for support.\n\n\n### Contributing\nPlease contribute using [Github Flow](https://guides.github.com/introduction/flow/). Create a branch, add commits, and [open a pull request](https://github.com/kmedian/korr/compare/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkmedian%2Fkorr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkmedian%2Fkorr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkmedian%2Fkorr/lists"}