Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kennethleungty/logistic-regression-assumptions

Assumptions of Logistic Regression, Clearly Explained
https://github.com/kennethleungty/logistic-regression-assumptions

logistic-regression logistic-regression-algorithm logistic-regression-assumptions logistic-regression-classifier logistic-regression-implementation logistic-regression-models python statistics

Last synced: 2 months ago
JSON representation

Assumptions of Logistic Regression, Clearly Explained

Awesome Lists containing this project

README

        

# Assumptions of Logistic Regression, Clearly Explained
#### Understanding and implementing the assumption checks behind one of the most important statistical techniques in data science - Logistic Regression
- Link to TowardsDataScience article: https://towardsdatascience.com/assumptions-of-logistic-regression-clearly-explained-44d85a22b290
- Logistic regression is a highly effective modeling technique that has remained a mainstay in statistics since its development in the 1940s.
- Given its popularity and utility, data practitioners should understand the fundamentals of logistic regression before using it to tackle data and business problems.
- In this project, we explore the key assumptions of logistic regression with theoretical explanations and practical Python implementation of the assumption checks.
___

### Contents
**(1) Logistic_Regression_Assumptions.ipynb**
- The main notebook containing the Python implementation codes (along with explanations) on how to check for each of the 6 key assumptions in logistic regression

**(2) Box-Tidwell-Test-in-R.ipynb**
- Notebook containing R code for running Box-Tidwell test (to check for logit linearity assumption)

**(3) /data**
- Folder containing the public Titanic dataset (train set)

**(4) /references**
- Folder containing several sets of lecture notes explaining advanced regression
___
### Special Thanks
- @dataninj4 for correcting imports and adding .loc referencing in diagnosis_df cell so that it runs without errors in Python 3.6/3.8
- @ArneTR for rightly pointing out that VIF calculation should include a constant, and correlation matrix should exclude target variable
___

### References
- [Machine Learning Essentials - Practical Guide in R](http://www.sthda.com/english/articles/36-classification-methods-essentials/148-logistic-regression-assumptions-and-diagnostics-in-r/)
- [Logistic and Linear Regression Assumptions - Violation Recognition and Control](www.lexjansen.com/wuss/2018/130_Final_Paper_PDF.pdf)
- [Testing linearity in the logit using Box-Tidwell Transformation in SPSS - Youtube](https://www.youtube.com/watch?v=sciPFNcYqi8&ab_channel=MikeCrowson)
- [Logistic Regression using SPSS](https://www.researchgate.net/publication/344138306_Logistic_Regression_Using_SPSS)
- [Statistics How To - Cook's Distance](https://www.statisticshowto.com/cooks-distance/)
- [Statsmodels Documentation - GLM](https://www.statsmodels.org/stable/glm.html)
- [Statsmodels Documetation - Logit Influence example notebook](https://www.statsmodels.org/dev/examples/notebooks/generated/influence_glm_logit.html)
- [PennState Eberly College of Science - Stat 462](https://online.stat.psu.edu/stat462/node/173/)
- [Statistics Solution - Assumptions of Logistic Regression](https://bookdown.org/jefftemplewebb/IS-6489/logistic-regression.html#fn40)
- [Course Notes for IS 6489 - Statistics and Predictive Analytics](https://bookdown.org/jefftemplewebb/IS-6489/logistic-regression.html#fn40)
- [MSc in Big Data Analytics at Carlos III University of Madrid - Notes for Predictive Modeling](https://bookdown.org/egarpor/PM-UC3M/)
- [Freakonometrics - Residuals from a Logistic Regression](https://freakonometrics.hypotheses.org/8210)
- [Kaggle - Titanic - Logistic Regression with Python](https://www.kaggle.com/mnassrib/titanic-logistic-regression-with-python)
- [Yellowbrick API Reference - Cook's Distance](https://www.scikit-yb.org/en/latest/api/regressor/influence.html?highlight=cook#module-yellowbrick.regressor.influence)
- [DataCamp - Understanding Logistic Regression in Python](https://www.datacamp.com/community/tutorials/understanding-logistic-regression-python)
- [Statology - How to Calculate Cook's Distance](https://www.statology.org/cooks-distance-python/)
- [ResearchGate - Box-Tidwell Test in SPSS](https://www.researchgate.net/post/What_is_the_correct_way_to_do_Box-Tidwell_test_in_SPSS_for_logistic_regression)
- [CrossValidated - Why include x ln x interaction term helps](https://stats.stackexchange.com/questions/217471/why-does-including-x-lnx-interaction-term-in-logistic-regression-model-helps)
- [UCLA IDRE - Logistic Regression Diagnostics](https://stats.idre.ucla.edu/stata/webbooks/logistic/chapter3/lesson-3-logistic-regression-diagnostics-2/)
- [Logistic and Linear Regression Assumptions: Violation Recognition and Control](https://www.lexjansen.com/wuss/2018/130_Final_Paper_PDF.pdf)