Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mlund2k/project-1-baseball-performance-vs.-attendance
Project assets for my first exploratory data analysis: Baseball Performance vs. Attendance.
https://github.com/mlund2k/project-1-baseball-performance-vs.-attendance
bigquery data-analysis data-cleaning data-visualization excel rstudio sql tableau tidyverse
Last synced: 8 days ago
JSON representation
Project assets for my first exploratory data analysis: Baseball Performance vs. Attendance.
- Host: GitHub
- URL: https://github.com/mlund2k/project-1-baseball-performance-vs.-attendance
- Owner: mlund2k
- Created: 2024-09-13T00:20:21.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-09-13T06:56:11.000Z (5 months ago)
- Last Synced: 2025-01-21T22:09:54.314Z (8 days ago)
- Topics: bigquery, data-analysis, data-cleaning, data-visualization, excel, rstudio, sql, tableau, tidyverse
- Language: HTML
- Homepage:
- Size: 1.8 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Baseball: Performance vs. Attendance
## Objective
**Goal:** Clean and analyze data to determine suitable teams for an advertisement firm, determine which if any key performance indicators lead to an increase in attendance for advertising purposes.
**Business Prompt:**
1. Its 2015 and the stakeholder wants to invest in advertisements in a baseball stadium, and would potentially like to strike a brand deal with a high performing player on the team, ideally someone of mvp or all star status. Determine the most suitable team based on these metrics.
2. During the analysis, the stakeholder in question is curious as to what, if any, game statistics may impact the turnout of audience for a particular season. Additionally, a visual of audience trends may assist in the stakeholders foresight when making a decision.## Process
Firstly, the raw data found on [Kaggle](https://www.kaggle.com/datasets/seanlahman/the-history-of-baseball) was a bit messy with missing values, mislabelled columns, and extreneous data.
I started by cleaning the files in Excel with filtering, altering data types, deleting extra columns, and implementing necessary formulas.
The cleaned files to be used can be found here:
- [Appearance Table](https://github.com/mlund2k/Project-1-Baseball-Performance-vs.-Attendance/blob/main/appearances_cleaned_2010-2014.csv)
- [Home Games Table](https://github.com/mlund2k/Project-1-Baseball-Performance-vs.-Attendance/blob/main/home_game_cleaned_2010-2014.csv)
- [Player Award Table](https://github.com/mlund2k/Project-1-Baseball-Performance-vs.-Attendance/blob/main/player_award_cleaned_2010-2014.csv)
- [Players Table](https://github.com/mlund2k/Project-1-Baseball-Performance-vs.-Attendance/blob/main/player_cleaned_2010-2014.csv)
- [Teams Table](https://github.com/mlund2k/Project-1-Baseball-Performance-vs.-Attendance/blob/main/team_cleaned_2010-2014.csv)For the next step, I chose to join various tables in SQL and create ranked lists for each metric within BigQuery.
A full outlined step by step documentation with SQL queries and explanation can be found through pastebin or github:
- [Github](https://github.com/mlund2k/Project-1-Baseball-Performance-vs.-Attendance/blob/main/proj_notes.txt)
- [Pastebin](https://pastebin.com/7rwxUaxx)Part 2 of the analysis makes use of R markdown for a more interactive documentation. This can be found in the following links:
- [R Markdown (Kaggle)](https://www.kaggle.com/code/mattlund2k/first-project-baseball-analysis) (Preferred)
- [Github](https://github.com/mlund2k/Project-1-Baseball-Performance-vs.-Attendance/blob/main/proj.pdf) (pdf format)
- [Github](https://github.com/mlund2k/Project-1-Baseball-Performance-vs.-Attendance/blob/main/proj.html) (raw html format)## End Product
Part 1 Uses Tableau to visualize the data, find the visual online here:
- Inveract with [Tableau Dashboard](https://public.tableau.com/views/BaseballAnalysisfirstproject/Dashboard1?:language=en-US&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link)
- Or view a [pdf](https://github.com/mlund2k/Project-1-Baseball-Performance-vs.-Attendance/blob/main/Dashboard%201.pdf)Part 2 Uses ggplot2 in the R tidyverse package to manually construct a visual with code. Find the associated visual as part of the R Markdown:
- [R Markdown (Kaggle)](https://www.kaggle.com/code/mattlund2k/first-project-baseball-analysis)