Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lgibson7/women-in-stem

Final Project for STAT 694 Applied Research in Statistics & Biostatistics, Cal State East Bay Fall 2022
https://github.com/lgibson7/women-in-stem

regression-models statistical-analysis

Last synced: about 1 month ago
JSON representation

Final Project for STAT 694 Applied Research in Statistics & Biostatistics, Cal State East Bay Fall 2022

Host: GitHub
URL: https://github.com/lgibson7/women-in-stem
Owner: lgibson7
Created: 2022-11-23T16:41:38.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2022-12-10T19:49:01.000Z (about 2 years ago)
Last Synced: 2024-12-17T16:55:46.369Z (about 1 month ago)
Topics: regression-models, statistical-analysis
Language: JavaScript
Homepage: https://lgibson7.quarto.pub/women-in-stem/
Size: 5.8 MB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Practicality of Using Transformations in Multiple Linear Regression

In our previous research, [Gender Wage Inequality in STEM](https://github.com/lgibson7/Gender-Wage-Inequality-in-STEM), my colleagues and I used multiple linear regression (MLR) to explore the relationship between gender demographics and median salary of STEM major categories. Our final model used the inverse transformation of the response variable to improve the model fit. Transforming response (and/or explanatory) variables, common practice among statisticians, can lead to a better fitting model, but these models are not easily understood by the average person.

# Research Goals

In this project, I compared the multiple linear regression model with the inverse transformation dependent response variable, $Median^{-1}$, from my previous project to a comparable model without an inverse transformation dependent response variable. My goal was to see how much prediction power is lost by not using a transformed response variable to fit a MLR model, and whether it is worth the inability to easily explain your model when using a transformed response variable.

# Dataset Used

To address this problem, I used a subset of the College Majors dataset from FiveThirthyEight, found here: https://github.com/fivethirtyeight/data/blob/master/college-majors/women-stem.csv

# Tools Used

* Packages: tidyverse, ggpubr, easystats, lindia, ggstatsplot
* Statistical Tests & Analyses: Box-Cox, Step-wise selection, Model Diagnostics