Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lgibson7/women-in-stem
Final Project for STAT 694 Applied Research in Statistics & Biostatistics, Cal State East Bay Fall 2022
https://github.com/lgibson7/women-in-stem
regression-models statistical-analysis
Last synced: about 1 month ago
JSON representation
Final Project for STAT 694 Applied Research in Statistics & Biostatistics, Cal State East Bay Fall 2022
- Host: GitHub
- URL: https://github.com/lgibson7/women-in-stem
- Owner: lgibson7
- Created: 2022-11-23T16:41:38.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2022-12-10T19:49:01.000Z (about 2 years ago)
- Last Synced: 2024-12-17T16:55:46.369Z (about 1 month ago)
- Topics: regression-models, statistical-analysis
- Language: JavaScript
- Homepage: https://lgibson7.quarto.pub/women-in-stem/
- Size: 5.8 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Practicality of Using Transformations in Multiple Linear Regression
In our previous research, [Gender Wage Inequality in STEM](https://github.com/lgibson7/Gender-Wage-Inequality-in-STEM), my colleagues and I used multiple linear regression (MLR) to explore the relationship between gender demographics and median salary of STEM major categories. Our final model used the inverse transformation of the response variable to improve the model fit. Transforming response (and/or explanatory) variables, common practice among statisticians, can lead to a better fitting model, but these models are not easily understood by the average person.
# Research Goals
In this project, I compared the multiple linear regression model with the inverse transformation dependent response variable, $Median^{-1}$, from my previous project to a comparable model without an inverse transformation dependent response variable. My goal was to see how much prediction power is lost by not using a transformed response variable to fit a MLR model, and whether it is worth the inability to easily explain your model when using a transformed response variable.
# Dataset Used
To address this problem, I used a subset of the College Majors dataset from FiveThirthyEight, found here: https://github.com/fivethirtyeight/data/blob/master/college-majors/women-stem.csv
# Tools Used
* Packages: tidyverse, ggpubr, easystats, lindia, ggstatsplot
* Statistical Tests & Analyses: Box-Cox, Step-wise selection, Model Diagnostics