https://github.com/timothyf/baseball-data-lab
A Python application and library that generates comprehensive advanced stat summary sheets for MLB players, customizable by year, providing in-depth analysis and visualizations. It can also be used as a library module, enabling users to develop their own features and extend functionality for custom applications and data processing needs.
https://github.com/timothyf/baseball-data-lab
analytics baseball baseball-analytics baseball-data baseball-statistics mlb python sabermetrics statcast
Last synced: 3 months ago
JSON representation
A Python application and library that generates comprehensive advanced stat summary sheets for MLB players, customizable by year, providing in-depth analysis and visualizations. It can also be used as a library module, enabling users to develop their own features and extend functionality for custom applications and data processing needs.
- Host: GitHub
- URL: https://github.com/timothyf/baseball-data-lab
- Owner: timothyf
- License: mit
- Created: 2024-10-01T15:18:14.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-09-09T18:47:53.000Z (9 months ago)
- Last Synced: 2025-09-09T22:06:06.669Z (9 months ago)
- Topics: analytics, baseball, baseball-analytics, baseball-data, baseball-statistics, mlb, python, sabermetrics, statcast
- Language: Python
- Homepage:
- Size: 78.1 MB
- Stars: 10
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Baseball Data Lab
`baseball-data-lab` is a Python application and library for creating advanced stat
summary sheets for MLB players. It supports yearly customizations and provides
visualizations. The project can also be imported as a library so you can extend
its functionality for custom applications or data processing workflows. It uses
the [`pybaseball`](https://github.com/jldbc/pybaseball) and
[`MLB-StatsAPI`](https://github.com/toddrob99/MLB-StatsAPI) libraries along with
other Python packages to gather and format data for dashboards, reports and
other analytical tools.
The project retrieves data from MLB and FanGraphs to ensure accurate,
up‑to‑date statistics. Future releases will continue to expand the
application's capabilities so it can serve as both a standalone tool and a
reusable library.
## Sample Summary Sheets
Below are samples of the summary sheets that can be generated by this project. The first sample is a Batting Summary for Riley Greene for the 2024 season. The second sample is a Pitching Summary for Tarik Skubal for the 2024 season.
In addition to the baseball stats you would expect, the summary sheets also include the following "advanced" stats:
Batters
Pitchers
BB%
UBR
K/9
Opponent Avg
Swing %
K%
wRC
BB/9
WHIP
Splits
OBP
wRAA
K/BB
BABIP
SLG
wOBA
H/9
LOB%
OPS
wRC+
HR/9
ERA-
ISO
WAR
K%
FIP-
Spd
Splits
BB%
FIP
BABIP
K-BB%
RS/9
## Project Structure
The project is organized as follows:
```text
baseball-data-lab/
├── README.md
├── setup.py
├── requirements.txt
├── baseball_data_lab/ # Source code
│ ├── apis/ # API clients for MLB and FanGraphs
│ ├── data_viz/ # Plotting utilities
│ ├── player/ # Player models and helpers
│ ├── summary_sheets/ # Classes that generate summary sheets
│ ├── team/ # Team utilities
│ └── ...
├── examples/ # Example scripts for data collection
└── tests/ # Unit tests
```
## Installation
To get started with the project, follow these steps:
1. Clone the repository:
```bash
git clone https://github.com/timothyf/baseball-data-lab.git
cd baseball-data-lab
```
2. Set up a Python virtual environment (optional but recommended):
```bash
python3 -m venv venv
source venv/bin/activate
```
3. Install the required dependencies:
```bash
pip install -r requirements.txt
```
## Usage
#### Generating Player Summary Sheets
There are several scripts in the `examples` directory for some basic functionality:
```bash
python examples/generate_player_summary.py [options]
Options:
--players [1 or more player names]
--teams [1 or more team names]
--year [specify a 4-digit year]
```
#### Saving Statcast Data
Run the project by executing the script in the `examples` directory:
```bash
python examples/save_statcast_data.py [options]
--players [1 or more player names]
--teams [1 or more team names]
--year [specify a 4-digit year]
```
## Examples
#### Generate a player sheet for Riley Greene
```bash
python examples/generate_player_summary.py --players 'Riley Greene'
```
Output:
`output/2024/Tigers/batter_summary_riley_greene.png`

#### Generate player sheets for all of the 2024 Detroit Tigers
```bash
python examples/generate_player_summary.py --teams 'Detroit Tigers' --year 2024
```
## Database Setup
To set up the PostgreSQL database for Baseball Data Lab, follow these steps:
1. **Install PostgreSQL:**
Download and install PostgreSQL from [postgresql.org](https://www.postgresql.org/download/).
2. **Create the Database:**
Open your terminal and run:
```bash
createdb baseball_data_lab_db
```
3. **Initialize the Schema:**
Run the provided `setup_db.sql` file to create the tables:
```bash
psql -d baseball_data_lab_db -f setup_db.sql
```
4. **Verify the Setup:**
Connect to your database and list the tables:
```bash
psql -d baseball_data_lab_db
\dt
```
You should see tables such as `games`, `players`, `umpires` and `plate_appearances`.
## Inspiration
This project was inspired by my time working in the R&D department of the Washington Nationals, and the pitching summary project from Thomas Nestico. Here is a link to an article describing his project:
https://medium.com/@thomasjamesnestico/creating-the-perfect-pitching-summary-7b8a981ef0c5
## Copyright Notice
This package and its author are not affiliated with MLB or any MLB team. This API wrapper interfaces with MLB's Stats API. Use of MLB data is subject to the notice posted at http://gdx.mlb.com/components/copyright.txt.
table td.batter-col {
background-color: lightblue;
color: black;
}
table td.pitcher-col {
background-color: lightgreen;
color: black;
}