https://github.com/malcolmgaynor/mlb-relief-pitcher-categorization-and-analysis

K-means clustering and gradient boosting (XGBoost)
https://github.com/malcolmgaynor/mlb-relief-pitcher-categorization-and-analysis

baseball gradient-boosting k-means-clustering mlb

Last synced: 15 days ago
JSON representation

K-means clustering and gradient boosting (XGBoost)

Host: GitHub
URL: https://github.com/malcolmgaynor/mlb-relief-pitcher-categorization-and-analysis
Owner: malcolmgaynor
Created: 2024-05-27T20:59:44.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-05-27T21:52:55.000Z (about 1 year ago)
Last Synced: 2025-02-17T04:15:59.474Z (3 months ago)
Topics: baseball, gradient-boosting, k-means-clustering, mlb
Language: R
Homepage:
Size: 1.03 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# MLB-Relief-Pitcher-Categorization-and-Analysis

Final project for the Stat 306: Multivariate Sports Analytics class with Professor Bradley A. Hartlaub at Kenyon College. May 7th, 2024.

In this project, I sought to identify MLB relief pitchers who are prone to breaking out (having much more success than they currently are having) if they change their pitching strategy. First, I classified 1,087 MLB relief pitcher seasons from 2018 to 2023 using K-means clustering, which resulted in 7 distinct clusters. These 7 clusters represent 7 different relief pitcher pitching styles/strategies, and are based on only things within a pitcher's control (outcome independent), such as their pitch repertoire, percent of pitches thrown in the strike zone, etc.

Next, I used gradient boosting (specifically XGBoost) to create 7 different models to predict ERA, a different model for pitchers in each cluster. Then, I applied all 7 models to every pitcher, in order to predict what ERA each pitcher would have had if he were in each of the different clusters. Finally, I looked into a handful of specific case studies of relief pitchers who, according to one of the XGBoost models, would have had a much better season if they had pitched according to the strategy of a different cluster. In multiple examples, these models correctly predict adjustments that relievers made, which led to their increased success.

This repository includes a 14 page final paper outlining methodology and results, the code (written in R) used to create the models and do the analysis, and the data I considered. If you have any questions or are interested in the process, data, models, code, or analysis, please do not hesitate to reach out!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/malcolmgaynor/mlb-relief-pitcher-categorization-and-analysis

Awesome Lists containing this project

README