https://github.com/viniciusmsousa/ds-toolbox
ToolBox to help the data scientist analytical work.
https://github.com/viniciusmsousa/ds-toolbox
data-scientist toolbox
Last synced: about 1 year ago
JSON representation
ToolBox to help the data scientist analytical work.
- Host: GitHub
- URL: https://github.com/viniciusmsousa/ds-toolbox
- Owner: viniciusmsousa
- Created: 2021-04-03T12:47:21.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2021-11-02T01:57:34.000Z (over 4 years ago)
- Last Synced: 2023-12-19T17:14:12.904Z (over 2 years ago)
- Topics: data-scientist, toolbox
- Language: Python
- Homepage: https://viniciusmsousa.github.io/ds-toolbox/
- Size: 7.5 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## DS-ToolBox
[](https://pypi.org/project/ds-toolbox/)
[](https://github.com/viniciusmsousa/ds-toolbox/actions)
[](https://codecov.io/gh/viniciusmsousa/ds-toolbox?branch=main)
[](https://github.com/viniciusmsousa/ds-toolbox)
[](https://pypistats.org/packages/ds-toolbox)
A set of functions to help the analytical work of a data scientist. Full documentation can be found in [Package Homepage](https://viniciusmsousa.github.io/ds-toolbox/). The main motivation of the package is to facilitate the taks by using a common input and output structure, SparkDF and PandasDF.
### Instalation
The package can be installed either using PyPi:
```
pip install ds-toolbox
```
Or directly form github:
```
pip install git+https://github.com/viniciusmsousa/ds-toolbox.git#main
```
### Current availiable modules and functions are listed bellow:
- statistics:
- `contigency_chi2_test`: Wrapper for [Scipy](https://github.com/scipy/scipy) function;
- `mannwhitney_pairwise`: Wrapper for [scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) function;
- `ks_test`: Compute the [KS-Test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test), for Pandas and Spark DF;
- `ab_test_pairwise`: A Simple ab test based on mean, std and var, PandasDF and Spark DF.
- ml:
- evaluator:
- `binary_classifier_metrics`: Computes classification metrics (confusion_matrix, accuracy, f1, precision, recall, aucroc, aucpr) based on a dataframe (SparkDF or PandasDF) with ground truth and prediction.
- econometrics:
- causal_regression:
- `CausalRegression`: A class built on top of what is presented in the chapters 19-21 from the book [Causal Inference for The Brave and True](https://github.com/matheusfacure/python-causality-handbook/tree/master).