{"id":15893058,"url":"https://github.com/pddiii/mlb-roy","last_synced_at":"2026-05-05T19:32:20.976Z","repository":{"id":257667122,"uuid":"848284867","full_name":"pddiii/MLB-ROY","owner":"pddiii","description":"2024 MLB Rookie of the Year Race","archived":false,"fork":false,"pushed_at":"2024-09-18T19:57:50.000Z","size":58136,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-08T08:14:57.510Z","etag":null,"topics":["baseball","baseballr","lightgbm","lightgbm-classifier","mlb","pandas","python","r","tidyverse"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pddiii.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-27T13:36:27.000Z","updated_at":"2024-09-18T19:57:53.000Z","dependencies_parsed_at":"2024-10-28T03:15:22.240Z","dependency_job_id":null,"html_url":"https://github.com/pddiii/MLB-ROY","commit_stats":null,"previous_names":["pddiii/mlb-roy"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pddiii%2FMLB-ROY","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pddiii%2FMLB-ROY/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pddiii%2FMLB-ROY/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pddiii%2FMLB-ROY/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pddiii","download_url":"https://codeload.github.com/pddiii/MLB-ROY/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246863648,"owners_count":20846294,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["baseball","baseballr","lightgbm","lightgbm-classifier","mlb","pandas","python","r","tidyverse"],"created_at":"2024-10-06T08:06:49.794Z","updated_at":"2025-09-18T16:32:30.798Z","avatar_url":"https://github.com/pddiii.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MLB-ROY\nLet's talk about the MLB Rookie of the Year Race\n\n## Full Report\n\n[Full Detailed Report found here](/scripts_and_notebooks/report.pdf)\n\n## Data\n\n- 1974 to 2024 rookie data\n  - Excluded 1994 and 2020 since they were shortened seasons\n  - 1974-2023 were used for Training/Testing Data\n  - 2024 was used to make predictions\n- Starters: minimum 100 Innings Pitched (IP)\n- Relievers: minimum 40 Innings Pitched (IP)\n- Batters: minimum 300 Plate Appearances (PA)\n\n## Data Cleaning\n\n[Data Cleaning Notebook](/scripts_and_notebooks/roy_cleaning.ipynb)\n\n### Helper Functions\n\nI utilized several functions repeatedly throughout the modeling process.\n\nIn order to reduce the lines of code during the modelling I created a python script (.py file) which contains these various helper functions.\n\n[Helper Functions Script](/scripts_and_notebooks/helper_functions.py)\n\n### Source\n\nThere were several sources for the data utilized in this project.\n\nThe data in the [Awards Folder](data/awards/) is sourced from either `Lahman` database or `baseballr`.\n\nThe data in the [Fielding Folder](data/fielding/) is sourced from the `Lahman` database\n\nThe data in the [MLB Folder](data/mlb/) and the data in the [Rookies Folder](data/rookies/) were sourced from FanGraphs utilizing their custom reports feature.\n\nThe [Cleaned Player IDs](data/cleaned_player_ids.csv) were sourced from a previous project of mine which combined player ids from the Lahman Database and the more frequently updated [PlayerIDMap](https://docs.google.com/spreadsheets/d/1JgczhD5VDQ1EiXqVG-blttZcVwbZd5_Ne_mefUGwJnk/pubhtml?gid=0\u0026single=true).\n\n## Models\n\n- For both models I fit the model to three different data sets:\n  - One for relievers\n  - One for starters\n  - One for batters\n\n### Vote Recipients\n\n- Output: Probability between 0 and 1 for receiving a Rookie of the Year vote (`vote_getter`)\n- Took the top 8 for 2024 from both the AL and NL\n- Utilize these predictions in the predictions for the 2024 Rookie of the Year model\n  - Rounded the top 8 vote getters to a 1, and the rest to a 0 for proper interpretation in the Rookie of the Year model predictions\n\n**Predictions**\n\n[Vote Getter Predictions](/data/predictions/vote_preds.csv)\n\n### Rookie of the Year\n\n- Output: Probability between 0 and 1 for winning Rookie of the Year (`rookie_of_the_year`)\n- Utilize these predictions to discuss the possible Rookie of the Year candidates\n\n**Predictions**\n\n[Rookie of the Year Predictions](/data/predictions/vote_roy_preds.csv)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpddiii%2Fmlb-roy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpddiii%2Fmlb-roy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpddiii%2Fmlb-roy/lists"}