https://github.com/cadedupont/mlb-data-analysis
Performing analysis on dataset of active MLB players in R
https://github.com/cadedupont/mlb-data-analysis
baseball-analytics data-analysis data-science mlb-stats-api r
Last synced: 11 months ago
JSON representation
Performing analysis on dataset of active MLB players in R
- Host: GitHub
- URL: https://github.com/cadedupont/mlb-data-analysis
- Owner: cadedupont
- Created: 2024-02-01T15:07:15.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-02T20:36:26.000Z (almost 2 years ago)
- Last Synced: 2025-01-03T22:23:10.519Z (about 1 year ago)
- Topics: baseball-analytics, data-analysis, data-science, mlb-stats-api, r
- Language: R
- Homepage:
- Size: 1.14 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MLB Data Analysis
Project intended to familiarize myself with data analysis in R. The data used is from the [Lahman database](https://cran.r-project.org/web/packages/Lahman/Lahman.pdf), which contains a wide variety of statistics for Major League Baseball (MLB).
## [`era_vs_age.R`](src/era_vs_age.R)
Creates a scatter plot of the earned run average (ERA) of MLB pitchers against their age in the 2022 season. The data utilizes the `Pitching` table left-joined with the `People` table in the database to get the age of the pitchers.
To be qualified for the plot, a pitcher must have thrown at least 100 innings in the season and played in a minimum of 20 games. This is to ensure that the pitcher had a significant amount of playing time in the season (i.e. ignore position players that have pitched, pitchers that were injured, etc.).
## [`win_vs_salary.R`](src/win_vs_salary.R)
Creates a scatter plot of the win percentage of MLB teams in 2016 against their total expenditure on players salaries for that season.