Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kmedian/korr
collection of utility functions for correlation analysis
https://github.com/kmedian/korr
binary-correlation confusion-matrix correlation correlation-analysis correlation-matrix correlation-pairs eda kendall kendall-tau matthews p-value pearson pearson-correlation pypi python rank-correlation sample-correlation spearman
Last synced: 2 days ago
JSON representation
collection of utility functions for correlation analysis
- Host: GitHub
- URL: https://github.com/kmedian/korr
- Owner: kmedian
- License: apache-2.0
- Created: 2018-11-20T15:17:30.000Z (almost 6 years ago)
- Default Branch: main
- Last Pushed: 2022-06-19T21:09:47.000Z (over 2 years ago)
- Last Synced: 2024-10-08T01:48:14.061Z (about 1 month ago)
- Topics: binary-correlation, confusion-matrix, correlation, correlation-analysis, correlation-matrix, correlation-pairs, eda, kendall, kendall-tau, matthews, p-value, pearson, pearson-correlation, pypi, python, rank-correlation, sample-correlation, spearman
- Language: Python
- Size: 1.77 MB
- Stars: 7
- Watchers: 2
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[![PyPI version](https://badge.fury.io/py/korr.svg)](https://badge.fury.io/py/korr)
[![Total alerts](https://img.shields.io/lgtm/alerts/g/kmedian/korr.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/kmedian/korr/alerts/)
[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/kmedian/korr.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/kmedian/korr/context:python)# korr
collection of utility functions for correlation analysis## Usage
Check the [examples](https://github.com/kmedian/korr/tree/master/examples) folder for notebooks.Compute correlation matrix and its p-values
* [pearson](https://github.com/kmedian/korr/blob/master/examples/pearson.ipynb) -- Pearson/Sample correlation (interval- and ratio-scale data)
* [kendall](https://github.com/kmedian/korr/blob/master/examples/kendall.ipynb) -- Kendall's tau rank correlation (ordinal data)
* [spearman](https://github.com/kmedian/korr/blob/master/examples/spearman.ipynb) -- Spearman rho rank correlation (ordinal data)
* [mcc](https://github.com/kmedian/korr/blob/master/examples/mcc%20(Matthews%20correlation).ipynb) -- Matthews correlation coefficient between binary variablesEDA, Dig deeper into results
* [flatten](https://github.com/kmedian/korr/blob/master/examples/flatten.ipynb) -- A table (pandas) with one row for each correlation pairs with the variable indicies, corr., p-value. For example, try to find "good" cutoffs with `corr_vs_pval` and then look up the variable indicies with `flatten` afterwards.
* [slice_yx](https://github.com/kmedian/korr/blob/master/examples/slice_yx.ipynb) -- slice a correlation and p-value matrix of a (y,X) dataset into a (y,x_i) vector and (x_j, x_k) matrices
* [corr_vs_pval](https://github.com/kmedian/korr/blob/master/examples/corr_vs_pval.ipynb) -- Histogram to find p-value cutoffs (alpha) for a) highly correlated pairs, b) unrelated pairs, c) the mixed results.
* [bracket_pval](hhttps://github.com/kmedian/korr/blob/master/examples/bracket_pval.ipynb) -- Histogram with more fine-grained p-value brackets.
* [corrgram](https://github.com/kmedian/korr/blob/master/examples/corrgram.ipynb) -- Correlogram, heatmap of correlations with p-values in bracketsUtility functions
* [confusion](https://github.com/kmedian/korr/blob/master/examples/confusion.ipynb) -- Confusion matrix. Required for Matthews correlation (mcc) and is a bitter faster than sklearn's
Parameter Stability
* [bootcorr](https://github.com/kmedian/korr/blob/master/examples/bootcorr.ipynb) -- Estimate multiple correlation matrices based on bootstrapped samples. From there you can assess how stable correlation estimates are (how sensitive against in-sample variation). For example, stable estimates are good candidates for modeling, and unstable correlation pairs are good candidates for P-hacking and non-reproducibility.
Variable Selection, Search Functions
* [mincorr](https://github.com/kmedian/korr/blob/master/examples/mincorr.ipynb) -- From all estimated correlation pairs, pick a given `n=3,5,..` of variables with low and insignificant correlations among each other. (See [binsel](https://github.com/kmedian/binsel) package for an application.)
* `find_best` -- Find the N "best", i.e. high and most significant, correlations
* `find_worst` -- Find the N "worst", i.e. insignificant/random and low, correlations
* [find_unrelated](https://github.com/kmedian/korr/blob/master/examples/find_unrelated.ipynb) -- Return variable indicies of unrelated pairs (in terms of insignificant p-value)## Appendix
### Installation
The `korr` [git repo](http://github.com/kmedian/korr) is available as [PyPi package](https://pypi.org/project/korr)```
pip install korr
```### Install a virtual environment
```
python3.7 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
pip install -r requirements-demo.txt --no-cache-dir
```(If your git repo is stored in a folder with whitespaces, then don't use the subfolder `.venv`. Use an absolute path without whitespaces.)
### Commands
* Check syntax: `flake8 --ignore=F401`
* Run Unit Tests: `pytest`
* Remove `.pyc` files: `find . -type f -name "*.pyc" | xargs rm`
* Remove `__pycache__` folders: `find . -type d -name "__pycache__" | xargs rm -rf`Publish
```sh
pandoc README.md --from markdown --to rst -s -o README.rst
python setup.py sdist
twine upload -r pypi dist/*
```### Support
Please [open an issue](https://github.com/kmedian/korr/issues/new) for support.### Contributing
Please contribute using [Github Flow](https://guides.github.com/introduction/flow/). Create a branch, add commits, and [open a pull request](https://github.com/kmedian/korr/compare/).