https://github.com/dhchenx/correlation-kit

A toolkit for estimating the correlation between variables
https://github.com/dhchenx/correlation-kit

binary-variable correlation-analysis kendalltau multi-category pearson spearman

Last synced: 3 months ago
JSON representation

A toolkit for estimating the correlation between variables

Host: GitHub
URL: https://github.com/dhchenx/correlation-kit
Owner: dhchenx
License: mit
Created: 2021-12-05T19:21:21.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2021-12-05T19:23:23.000Z (over 3 years ago)
Last Synced: 2025-03-01T11:46:58.529Z (4 months ago)
Topics: binary-variable, correlation-analysis, kendalltau, multi-category, pearson, spearman
Language: Python
Homepage:
Size: 15.6 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

        # Correlation Kit

A toolkit for estimating the correlation values between variables

## Installation

```pip

pip install correlation-kit

```

## Correlation between two continual variables

```python

import pandas as pd

from correlation_kit.ck_wrapper import CorrelationKit

# set a dataframe or read from a csv file

d = {'x': [1, 2, 3.5, 4], 'y': [3, 4, 4.5, 6]}

df = pd.DataFrame(data=d)

# set x label and y label for correlation

x = "x"

y = "y"

# calc

def get_correlation(x, y, corr_type):

    stat = 0

    p = 0

    if corr_type == "pearson":

        stat, p = CorrelationKit(df).get_pearson(x, y)

    elif corr_type == "spearman":

        stat, p = CorrelationKit(df).get_spearman(x, y)

    elif corr_type == "kendalltau":

        stat, p = CorrelationKit(df).get_kendalltau(x, y)

    return stat, p

# print results

print("pearson = ", get_correlation(x, y, "pearson"))

print("spearman = ", get_correlation(x, y, "spearman"))

print("kendalltau = ", get_correlation(x, y, "kendalltau"))

```

## Estimate correlation between binary and continual variables

```python

import pandas as pd

from correlation_kit.ck_wrapper import CorrelationKit

# set a dataframe or read from a csv file

d = {'x': ['large', 'large', 'small', 'small'], 'y': ['hot', 'hot', 'cold', 'cold'],'z':[0,1,2.5,3]}

df = pd.DataFrame(data=d)

# set x label and y label for correlation, which is suitable for binary variables

r_p,r_s,r_k=CorrelationKit(df).get_corr_between_category_and_continual('x','large','z') # large=1; otherewise 0

# results

print('pearson: ',r_p)

print('speraman: ',r_s)

print('kendalltau: ',r_k)

```

## Estimate F value between multiple-category variable and continual variables

```python

import pandas as pd

from ck_wrapper import CorrelationKit

# set a dataframe or read from a csv file

d = {'x': ['large', 'large', 'middle','small', 'small'], 'y': ['hot', 'hot','warm', 'cold', 'cold'],'z':[0,1,2,2.5,3]}

df = pd.DataFrame(data=d)

# set x label and y label for correlation, which is suitable for multiple-category variables

F,p=CorrelationKit(df).get_f_oneway('x',['large','middle','small'],'z')

# results

print('F: ',F)

print('p: ',p)

```

## License

The `Correlation-Kit` project is provided by [Donghua Chen](https://github.com/dhchenx).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dhchenx/correlation-kit

Awesome Lists containing this project

README