Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pddiii/mlb-roy
2024 MLB Rookie of the Year Race
https://github.com/pddiii/mlb-roy
baseball baseballr lightgbm lightgbm-classifier mlb pandas python r tidyverse
Last synced: about 1 month ago
JSON representation
2024 MLB Rookie of the Year Race
- Host: GitHub
- URL: https://github.com/pddiii/mlb-roy
- Owner: pddiii
- Created: 2024-08-27T13:36:27.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-09-18T19:57:50.000Z (4 months ago)
- Last Synced: 2024-12-15T11:08:30.243Z (about 1 month ago)
- Topics: baseball, baseballr, lightgbm, lightgbm-classifier, mlb, pandas, python, r, tidyverse
- Language: Jupyter Notebook
- Homepage:
- Size: 55.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MLB-ROY
Let's talk about the MLB Rookie of the Year Race## Full Report
[Full Detailed Report found here](/scripts_and_notebooks/report.pdf)
## Data
- 1974 to 2024 rookie data
- Excluded 1994 and 2020 since they were shortened seasons
- 1974-2023 were used for Training/Testing Data
- 2024 was used to make predictions
- Starters: minimum 100 Innings Pitched (IP)
- Relievers: minimum 40 Innings Pitched (IP)
- Batters: minimum 300 Plate Appearances (PA)## Data Cleaning
[Data Cleaning Notebook](/scripts_and_notebooks/roy_cleaning.ipynb)
### Helper Functions
I utilized several functions repeatedly throughout the modeling process.
In order to reduce the lines of code during the modelling I created a python script (.py file) which contains these various helper functions.
[Helper Functions Script](/scripts_and_notebooks/helper_functions.py)
### Source
There were several sources for the data utilized in this project.
The data in the [Awards Folder](data/awards/) is sourced from either `Lahman` database or `baseballr`.
The data in the [Fielding Folder](data/fielding/) is sourced from the `Lahman` database
The data in the [MLB Folder](data/mlb/) and the data in the [Rookies Folder](data/rookies/) were sourced from FanGraphs utilizing their custom reports feature.
The [Cleaned Player IDs](data/cleaned_player_ids.csv) were sourced from a previous project of mine which combined player ids from the Lahman Database and the more frequently updated [PlayerIDMap](https://docs.google.com/spreadsheets/d/1JgczhD5VDQ1EiXqVG-blttZcVwbZd5_Ne_mefUGwJnk/pubhtml?gid=0&single=true).
## Models
- For both models I fit the model to three different data sets:
- One for relievers
- One for starters
- One for batters### Vote Recipients
- Output: Probability between 0 and 1 for receiving a Rookie of the Year vote (`vote_getter`)
- Took the top 8 for 2024 from both the AL and NL
- Utilize these predictions in the predictions for the 2024 Rookie of the Year model
- Rounded the top 8 vote getters to a 1, and the rest to a 0 for proper interpretation in the Rookie of the Year model predictions**Predictions**
[Vote Getter Predictions](/data/predictions/vote_preds.csv)
### Rookie of the Year
- Output: Probability between 0 and 1 for winning Rookie of the Year (`rookie_of_the_year`)
- Utilize these predictions to discuss the possible Rookie of the Year candidates**Predictions**
[Rookie of the Year Predictions](/data/predictions/vote_roy_preds.csv)