Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pranab/avenir

Set of Machine Learning and Stochastic Optimazion tools based on Hadoop, Spark and Storm https://pkghosh.wordpress.com/
https://github.com/pranab/avenir

Last synced: 3 months ago
JSON representation

Set of Machine Learning and Stochastic Optimazion tools based on Hadoop, Spark and Storm https://pkghosh.wordpress.com/

Awesome Lists containing this project

README

        

## Introduction
Set of predictive and exploratory machine learning tools with Spark and Python

## Philosophy
* Simple to use
* Input output in CSV format
* Metadata defined in simple JSON file
* Extremely configurable with tons of configuration knobs

## Solution
* Exploratry Analytic
* KNN Cluster
* Naive Bayes
* Discrimininant analysis
* Nearest Neighbor
* Decision Tree and Random Forest
* SVM
* Association Mining
* Reinforcement learning
* Multi Arm Bandit
* Stochastic Optimization
* Feedforward Network
* LSTM
* Autoencoder
* Deep Reinforcement Learning
* NLP and Neural Language Model
* Graph Convolution Network
* MLOps

## Blogs
The following blogs of mine are good source of details of avenir. These are the only source
of detail documentation
* http://pkghosh.wordpress.com/2014/03/12/using-mutual-information-to-find-critical-factors-in-hospital-readmission/
* http://pkghosh.wordpress.com/2014/01/09/boost-lead-generation-with-online-reinforcement-learning/
* http://pkghosh.wordpress.com/2013/11/06/retarget-campaign-for-abandoned-shopping-carts-with-decision-tree/
* http://pkghosh.wordpress.com/2013/10/06/predicting-customer-loyalty-trajectory/
* http://pkghosh.wordpress.com/2013/08/25/bandits-know-the-best-product-price/
* http://pkghosh.wordpress.com/2013/06/29/learning-but-greedy-gambler/
* http://pkghosh.wordpress.com/2013/04/15/smarter-email-marketing-with-markov-model/
* http://pkghosh.wordpress.com/2013/03/18/analytic-is-your-doctors-friend/
* http://pkghosh.wordpress.com/2013/02/19/stop-the-customer-separation-pain-bayesian-classifier/
* http://pkghosh.wordpress.com/2013/01/31/explore-with-cramer-index/
* https://pkghosh.wordpress.com/2015/07/06/customer-conversion-prediction-with-markov-chain-classifier/
* https://pkghosh.wordpress.com/2015/05/11/is-bigger-data-better-for-machine-learning/
* https://pkghosh.wordpress.com/2015/12/13/association-mining-with-improved-apriori-algorithm/
* https://pkghosh.wordpress.com/2016/03/14/is-neural-network-better-off-with-big-data/
* https://pkghosh.wordpress.com/2016/04/13/customer-churn-prediction-with-svm-using-scikit-learn/
* https://pkghosh.wordpress.com/2016/06/14/inventory-forecasting-with-markov-chain-monte-carlo/
* https://pkghosh.wordpress.com/2016/07/30/customer-segmentation-based-on-online-behavior-using-scikitlearn/
* https://pkghosh.wordpress.com/2016/10/27/supplier-fulfillment-forecasting-with-continuous-time-markov-chain-using-spark/
* https://pkghosh.wordpress.com/2017/04/30/predicting-call-hangup-in-customer-service-calls-with-decision-tree-and-random-forest/
* https://pkghosh.wordpress.com/2017/06/26/project-assignment-optimization-with-simulated-annealing-on-spark/
* https://pkghosh.wordpress.com/2017/09/18/handling-rare-events-and-class-imbalance-in-predictive-modeling-for-machine-failure/
* https://pkghosh.wordpress.com/2017/10/09/combating-high-cardinality-features-in-supervised-machine-learning/
* https://pkghosh.wordpress.com/2018/02/21/optimizing-discount-price-for-perishable-products-with-thompson-sampling-using-spark/
* https://pkghosh.wordpress.com/2018/03/19/handling-categorical-feature-variables-in-machine-learning-using-spark/
* https://pkghosh.wordpress.com/2018/04/18/predicting-crm-lead-conversion-with-gradient-boosting-using-scikitlearn/
* https://pkghosh.wordpress.com/2018/05/14/auto-training-and-parameter-tuning-for-a-scikitlearn-based-model-for-leads-conversion-prediction/
* https://pkghosh.wordpress.com/2018/06/18/leave-one-out-encoding-for-categorical-feature-variables-on-spark/
* https://pkghosh.wordpress.com/2018/07/18/improving-elastic-search-query-result-with-query-expansion-using-topic-modeling/
* https://pkghosh.wordpress.com/2019/02/10/supervised-machine-learning-parameter-search-and-tuning-with-simulated-annealing/
* https://pkghosh.wordpress.com/2019/05/07/synthetic-training-data-generation-for-machine-learning-classification-problems-using-ancestral-sampling/
* https://pkghosh.wordpress.com/2019/06/27/six-unsupervised-extractive-text-summarization-techniques-side-by-side/
* https://pkghosh.wordpress.com/2019/08/07/encoding-high-cardinality-categorical-variables-with-feature-hashing-on-spark/
* https://pkghosh.wordpress.com/2019/08/26/missing-value-imputation-with-restricted-boltzmann-machine-neural-network/
* https://pkghosh.wordpress.com/2019/10/23/automated-machine-learning-with-hyperopt-and-scikitlearn-without-writing-python-code/
* https://pkghosh.wordpress.com/2019/11/22/machine-learning-model-interpretation-and-prescriptive-analytic-with-lime/
* https://pkghosh.wordpress.com/2020/01/21/evaluation-of-time-series-predictability-with-kaboudan-metric-using-prophet/
* https://pkghosh.wordpress.com/2020/02/24/model-drift-detection-with-kolmogorov-smirnov-statistic-on-spark/
* https://pkghosh.wordpress.com/2020/03/26/building-scikitlearn-random-forest-model-and-tuning-parameters-without-writing-python-code/
* https://pkghosh.wordpress.com/2020/05/11/monte-carlo-simulation-library-in-python-with-project-cost-estimation-as-an-example/
* https://pkghosh.wordpress.com/2020/06/08/deep-reinforcement-learning-with-rllib-and-tensorflow-for-price-optimization/
* https://pkghosh.wordpress.com/2020/07/13/learn-about-your-data-with-about-seventy-data-exploration-functions-all-in-one-python-class/
* https://pkghosh.wordpress.com/2020/07/28/semantic-search-with-pre-trained-neural-transformer-model-using-document-sentence-and-token-level-embedding/
* https://pkghosh.wordpress.com/2020/08/18/predicting-individual-viral-infection-using-contact-data-with-lstm-neural-network/
* https://pkghosh.wordpress.com/2020/10/28/causal-inference-with-deep-learning-using-manufacturing-supply-chain-optimization-as-an-example/
* https://pkghosh.wordpress.com/2020/11/26/meeting-schedule-optimization-with-genetic-algorithm-in-python/
* https://pkghosh.wordpress.com/2021/02/26/detecting-and-measuring-human-bias-in-machine-learning-models/
* https://pkghosh.wordpress.com/2021/03/25/robustness-measurement-of-machine-learning-models-with-examples-in-python/
* https://pkghosh.wordpress.com/2021/05/25/data-driven-causal-relationship-discovery-with-python-example-code/
* https://pkghosh.wordpress.com/2021/07/21/duplicate-data-detection-with-neural-network-and-contrastive-learning/
* https://pkghosh.wordpress.com/2021/10/16/class-separation-based-machine-learning-model-performance-metric/
* https://pkghosh.wordpress.com/2021/11/30/machine-learning-model-performance-robustness-based-on-local-neighborhood-performance/
* https://pkghosh.wordpress.com/2021/12/30/conformal-prediction-for-a-neural-regression-model/
* https://pkghosh.wordpress.com/2022/01/26/remedial-action-recommendation-with-machine-learning-and-genetic-algorithm/
* https://pkghosh.wordpress.com/2022/02/25/out-of-distribution-data-detection-in-deployed-machine-learning-models/
* https://pkghosh.wordpress.com/2022/03/28/gig-economy-workforce-scheduling-with-reinforcement-learning/

## Getting started
Project's resource directory has various tutorial documents for the use cases described in
the blogs.

## Configuration
All configuration parameters are described in the wiki page
https://github.com/pranab/avenir/wiki/Configuration

## Build
Please refer to resource/dependency.txt for build time and run time dependencies

For Hadoop 1
* mvn clean install

For Hadoop 2 (non yarn)
* git checkout nuovo
* mvn clean install

For Hadoop 2 (yarn)
* git checkout nuovo
* mvn clean install -P yarn

## Help
Please feel free to email me at [email protected]

## Contribution
Contributors are welcome. Please email me at [email protected]