https://github.com/readytensor/rt_bin_class_base_simple_ann_fastapi_hyperopt_shapley

Simple ANN Classifier in PyTorch with Shapley explanations for Binary Classification - Base problem category as per Ready Tensor specifications.
https://github.com/readytensor/rt_bin_class_base_simple_ann_fastapi_hyperopt_shapley

Last synced: about 1 year ago
JSON representation

Simple ANN Classifier in PyTorch with Shapley explanations for Binary Classification - Base problem category as per Ready Tensor specifications.

Host: GitHub
URL: https://github.com/readytensor/rt_bin_class_base_simple_ann_fastapi_hyperopt_shapley
Owner: readytensor
Created: 2023-03-21T03:35:38.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-03-21T03:36:05.000Z (over 3 years ago)
Last Synced: 2024-04-16T03:12:38.240Z (about 2 years ago)
Language: Python
Size: 26.4 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

Simple ANN Classifier in PyTorch with Shapley explanations for Binary Classification - Base problem category as per Ready Tensor specifications.

- ANN
- shapley
- XAI
- HPT
- sklearn
- python
- pandas
- numpy
- hyperopt
- fastapi
- nginx
- uvicorn
- docker
- binary classification
- tensorflow
- keras

This is a Binary Classifier that uses Simple ANN implemented through PyTorch. Feature impacts are provided with Shapley values for model interpretability.

The data preprocessing step includes missing data imputation, standardization, one-hot encoding for categorical variables, datatype casting, etc. The missing categorical values are imputed using the most frequent value if they are rare. Otherwise if the missing value is frequent, they are give a "missing" label instead. Missing numerical values are imputed using the mean and a binary column is added to show a 'missing' indicator for the missing values. Numerical values are also scaled using a Yeo-Johnson transformation in order to get the data close to a Gaussian distribution.

Hyperparameter Tuning (HPT) is conducted by finding the optimal activation function (tanh or relu) as well as the optimal learning rate for SGD.

During the model development process, the algorithm was trained and evaluated on a variety of datasets such as email spam detection, customer churn, credit card fraud detection, cancer diagnosis, and titanic passanger survivor prediction.

This Binary Classifier is written using Python as its programming language. PyTorch is used to implement the main algorithm. Scikitlearn is used in the data preprocessing pipeline and model evaluation. Numpy, pandas, and `feature_engine` are used for the data preprocessing steps. SciKit-Optimize was used to handle the HPT. We use fastapi + Nginx + uvicorn for web service. The web service provides three endpoints- `/ping` for health check, `/infer` for predictions in real time and `/explain` to generate local explanations.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/readytensor/rt_bin_class_base_simple_ann_fastapi_hyperopt_shapley

Awesome Lists containing this project

README