{"id":17311392,"url":"https://github.com/schorrm/pybaseball","last_synced_at":"2025-04-14T14:43:38.966Z","repository":{"id":54447223,"uuid":"142289924","full_name":"schorrm/pybaseball","owner":"schorrm","description":"I'm maintaining the original repo now. please go to github.com/jldbc/pybaseball","archived":false,"fork":false,"pushed_at":"2021-02-17T10:02:21.000Z","size":1581,"stargazers_count":23,"open_issues_count":3,"forks_count":8,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-28T03:41:24.845Z","etag":null,"topics":["baseball","baseball-reference","baseball-savant","fangraphs","python","python3","sabermetrics","statcast-data","statistics","stats"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/schorrm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-25T11:19:16.000Z","updated_at":"2024-09-23T10:41:47.000Z","dependencies_parsed_at":"2022-08-13T16:00:26.993Z","dependency_job_id":null,"html_url":"https://github.com/schorrm/pybaseball","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schorrm%2Fpybaseball","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schorrm%2Fpybaseball/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schorrm%2Fpybaseball/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schorrm%2Fpybaseball/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/schorrm","download_url":"https://codeload.github.com/schorrm/pybaseball/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248898744,"owners_count":21179832,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["baseball","baseball-reference","baseball-savant","fangraphs","python","python3","sabermetrics","statcast-data","statistics","stats"],"created_at":"2024-10-15T12:40:24.981Z","updated_at":"2025-04-14T14:43:38.946Z","avatar_url":"https://github.com/schorrm.png","language":"Python","readme":"# pybaseball\n\n**2.0.0 Release: 18 August, 2020**\n\n## Recent Updates\n- `pybaseball` is now listed on the PyPI as `pybaseball2`. The package is still used as `import pybaseball`, but install it with `pip install pybaseball2`.\n- New functionality:\n   - Plot spray charts on stadium (#9, thanks to @andersonfrailey)\n   - Baseball Reference game logs (#4, thanks to @reddigari)\n   - More functions for Chadwick Bureau data (#8, thanks to @valdezt)\n   - Exposes Chadwick Bureau lookup table (#7)\n   - Top Prospects (#5, thanks to @TylerLiu42)\n   - Full Season Statcast data (#2, @TylerLiu42)\n   - Amateur Draft results (#11, @TylerLiu42)\n- Bugfixes, with thanks to @bgunn34 and @TAThor\n\n`pybaseball` is a Python package for baseball data analysis. This package scrapes Baseball Reference, Baseball Savant, and FanGraphs so you don't have to. The package retrieves statcast data, pitching stats, batting stats, division standings/team records, awards data, and more. Data is available at the individual pitch level, as well as aggregated at the season level and over custom time periods. See the [docs](https://github.com/schorrm/pybaseball/tree/master/docs) for a comprehensive list of data acquisition functions.\n\n## Installation\n\nPybaseball can be installed via pip:\n\n```bash\npip install pybaseball2\n```\n\nor from the repo (which may at times be more up to date):\n\n```bash\ngit clone https://github.com/schorrm/pybaseball\ncd pybaseball\npip install -e .\n```\n\n## Statcast: Pull advanced metrics from Major League Baseball's Statcast system\n\nStatcast data include pitch-level features such as Perceived Velocity (PV), Spin Rate (SR), Exit Velocity (EV), pitch X, Y, and Z coordinates, and more. The function `statcast(start_dt, end_dt)` pulls this data from baseballsavant.com. \n\n```python\n\u003e\u003e\u003e from pybaseball import statcast\n\u003e\u003e\u003e data = statcast(start_dt='2017-06-24', end_dt='2017-06-27')\n\u003e\u003e\u003e data.head(2)\n\n   index pitch_type  game_date  release_speed  release_pos_x  release_pos_z  \n0    314         CU 2017-06-27           79.7        -1.3441         5.4075\n1    332         FF 2017-06-27           98.1        -1.3547         5.4196\n\n  player_name    batter   pitcher     events     ...      release_pos_y  \n0   Matt Bush  608070.0  456713.0  field_out     ...            54.8585\n1   Matt Bush  429665.0  456713.0  field_out     ...            54.3470\n\n   estimated_ba_using_speedangle  estimated_woba_using_speedangle  woba_value  \n0                          0.100                            0.137         0.0\n1                          0.269                            0.258         0.0\n\n   woba_denom babip_value iso_value launch_speed_angle at_bat_number pitch_number  \n0         1.0         0.0       0.0                3.0          64.0          1.0\n1         1.0         0.0       0.0                3.0          63.0          3.0  \n[2 rows x 79 columns]\n```\n\nIf `start_dt` and `end_dt` are supplied, it will return all statcast data between those two dates. If not, it will return yesterday's data. The optional argument `verbose` will control whether the library updates you on its progress while it pulls the data.\n\nFor a player-specific statcast query, pull pitching or batting data using the `statcast_pitcher` and `statcast_batter` functions. These take the same `start_dt` and `end_dt` arguments as the statcast function, as well as a `player_id` argument. This ID comes from MLB Advanced Media, and can be obtained using the function `playerid_lookup`. A complete example: \n\n```python\n\u003e\u003e\u003e # Find Clayton Kershaw's player id\n\u003e\u003e\u003e from pybaseball import playerid_lookup\n\u003e\u003e\u003e from pybaseball import statcast_pitcher\n\u003e\u003e\u003e playerid_lookup('kershaw', 'clayton')\nGathering player lookup table. This may take a moment.\n\n  name_last name_first  key_mlbam key_retro  key_bbref  key_fangraphs  \n0   kershaw    clayton     477132  kersc001  kershcl01           2036\n\n   mlb_played_first  mlb_played_last\n0            2008.0           2017.0\n\n\u003e\u003e\u003e # His MLBAM ID is 477132, so we feed that as the player_id argument to the following function \n\u003e\u003e\u003e kershaw_stats = statcast_pitcher('2017-06-01', '2017-07-01', 477132)\n\u003e\u003e\u003e kershaw_stats.head(2)\n  pitch_type   game_date release_speed release_pos_x release_pos_z  \n0         SL  2017-06-29          87.2        1.0865        6.4034\n1         SL  2017-06-29          86.9        1.0195        6.4324\n\n       player_name  batter  pitcher     events              description  \n0  Clayton Kershaw  458913   477132  strikeout  swinging_strike_blocked\n1  Clayton Kershaw  458913   477132       null                     ball\n\n      ...       release_pos_y  estimated_ba_using_speedangle  \n0     ...             54.5463                            0.0\n1     ...             54.7625                            0.0\n\n   estimated_woba_using_speedangle  woba_value woba_denom babip_value  \n0                              0.0        0.00          1           0\n1                              0.0        null       null        null\n\n  iso_value launch_speed_angle at_bat_number pitch_number\n0         0               null            57            6\n1      null               null            57            5\n\n[2 rows x 78 columns]\n```\n\n## Pitching Stats: pitching stats for players across multiple seasons, single seasons, or during a specified time period\n\nThis library contains two main functions for obtaining pitching data. For league-wide season-level pitching data, use the function `pitching_stats(start_season, end_season)`. This will return one row per player per season, and provide all metrics made available by FanGraphs.\n\nThe second is `pitching_stats_range(start_dt, end_dt)`. This allows you to obtain pitching data over a specific time interval, allowing you to get more granular than the FanGraphs function (for example, to see which pitcher had the strongest month of May). This query pulls data from Baseball Reference. Note that all dates should be in `YYYY-MM-DD` format.\n\nIf you prefer Baseball Reference to FanGraphs, there is a third option called `pitching_stats_bref(season)`. This works the same as `pitching_stats`, but retrieves its data from Baseball Reference instead. This is typically not recommended, however, because the Baseball Reference query currently can only retrieve one season's worth of data per request.\n\n```python\n\u003e\u003e\u003e from pybaseball import pitching_stats\n\u003e\u003e\u003e data = pitching_stats(2012, 2016)\n\u003e\u003e\u003e data.head()\n     Season             Name     Team   Age     W    L   ERA  WAR     G    GS  \n336  2015.0  Clayton Kershaw  Dodgers  27.0  16.0  7.0  2.13  8.6  33.0  33.0\n236  2014.0  Clayton Kershaw  Dodgers  26.0  21.0  3.0  1.77  7.6  27.0  27.0\n472  2014.0     Corey Kluber  Indians  28.0  18.0  9.0  2.44  7.4  34.0  34.0\n235  2015.0     Jake Arrieta     Cubs  29.0  22.0  6.0  1.77  7.3  33.0  33.0\n256  2013.0  Clayton Kershaw  Dodgers  25.0  16.0  9.0  1.83  7.1  33.0  33.0\n\n       ...      wSL/C (pi)  wXX/C (pi)  O-Swing% (pi)  Z-Swing% (pi)  \n336    ...            1.76       22.85          0.364          0.665\n236    ...            2.62         NaN          0.371          0.670\n472    ...            3.92         NaN          0.336          0.598\n235    ...            2.42         NaN          0.329          0.618\n256    ...            0.74         NaN          0.339          0.635\n\n     Swing% (pi)  O-Contact% (pi)  Z-Contact% (pi)  Contact% (pi)  Zone% (pi)  \n336        0.511            0.478            0.811          0.689       0.487\n236        0.525            0.536            0.831          0.730       0.515\n472        0.468            0.485            0.886          0.744       0.505\n235        0.468            0.595            0.856          0.762       0.483\n256        0.484            0.563            0.873          0.763       0.492\n\n     Pace (pi)\n336       23.4\n236       23.7\n472       24.6\n235       23.3\n256       23.4\n\n[5 rows x 299 columns]\n```\n\n\n## Batting Stats: hitting stats for players within seasons or during a specified time period\n\nBatting stats are obtained similar to pitching stats. The function call for getting a season-level stats is `batting_stats(start_season, end_season)`, and for a particular time range it is `batting_stats_range(start_dt, end_dt)`. The Baseball Reference equivalent for season-level data is `batting_stats_bref(season)`. \n\n```python\n\u003e\u003e\u003e from pybaseball import batting_stats_range\n\u003e\u003e\u003e data = batting_stats_range('2017-05-01', '2017-05-08')\n\u003e\u003e\u003e data.head()\n          Name  Age  #days     Lev          Tm  G  PA  AB  R  H  ...    HBP  \n1   Jose Abreu   30     69  MLB-AL     Chicago  7  31  30  5  9  ...      0\n2   Lane Adams   27     69  MLB-NL     Atlanta  6   6   6  0  2  ...      0\n3   Matt Adams   28     68  MLB-NL   St. Louis  6   9   9  2  4  ...      0\n4   Jim Adduci   32     69  MLB-AL     Detroit  6  24  21  3  5  ...      0\n5  Tim Adleman   29     72  MLB-NL  Cincinnati  1   2   2  0  0  ...      0\n\n   SH  SF  GDP  SB  CS     BA    OBP    SLG    OPS\n1   0   0    1   0   0  0.300  0.323  0.667  0.989\n2   0   0    1   1   0  0.333  0.333  0.333  0.667\n3   0   0    0   0   0  0.444  0.444  0.778  1.222\n4   0   0    0   0   0  0.238  0.333  0.381  0.714\n5   0   0    0   0   0  0.000  0.000  0.000  0.000\n\n[5 rows x 27 columns]\n```\n\n## Game-by-Game Results and Schedule \nThe `schedule_and_record` function returns a team's game-by-game results for a given season, including game date, home and away teams, end result (W/L/Tie), score, winning/losing/saving pitchers, attendance, and division standing at that date. The function's only two arguments are `season` and `team`, where team is the team's abbreviation (i.e. NYY for New York Yankees, SEA for Seattle Mariners). If the season argument is set to the current season, the query returns results for past games and the schedule for those that have not occurred yet. \n\n```python\n# Example: Let's take a look at the individual-game results of the 1927 Yankees\n\u003e\u003e\u003e from pybaseball import schedule_and_record\n\u003e\u003e\u003e data = schedule_and_record(1927, 'NYY')\n\u003e\u003e\u003e data.head()\n                Date   Tm Home_Away  Opp W/L     R   RA   Inn  W-L  Rank  \\\n1    Tuesday, Apr 12  NYY      Home  PHA   W   8.0  3.0   9.0  1-0   1.0\n2  Wednesday, Apr 13  NYY      Home  PHA   W  10.0  4.0   9.0  2-0   1.0\n3   Thursday, Apr 14  NYY      Home  PHA   T   9.0  9.0  10.0  2-0   1.0\n4     Friday, Apr 15  NYY      Home  PHA   W   6.0  3.0   9.0  3-0   1.0\n5   Saturday, Apr 16  NYY      Home  BOS   W   5.0  2.0   9.0  4-0   1.0\n\n       GB      Win     Loss  Save  Time D/N  Attendance  Streak\n1    Tied     Hoyt    Grove  None  2:05   D     72000.0       1\n2  up 0.5  Ruether     Gray  None  2:15   D      8000.0       2\n3    Tied     None     None  None  2:50   D      9000.0       2\n4    Tied  Pennock    Ehmke  None  2:27   D     16000.0       3\n5  up 1.0  Shocker  Ruffing  None  2:05   D     25000.0       4\n```\n\n\n## Standings: up to date or historical division standings, W/L records\n\nThe `standings(season)` function gives division standings for a given season. If the current season is chosen, it will give the most current set of standings. Otherwise, it will give the end-of-season standings for each division for the chosen season. \n\nThis function returns a list of dataframes. Each dataframe is the standings for one of MLB's six divisions. \n\n```python\n\u003e\u003e\u003e from pybaseball import standings\n\u003e\u003e\u003e data = standings(2016)[4]\n\u003e\u003e\u003e print(data)\n                    Tm    W   L  W-L%    GB\n1         Chicago Cubs  103  58  .640    --\n2  St. Louis Cardinals   86  76  .531  17.5\n3   Pittsburgh Pirates   78  83  .484  25.0\n4    Milwaukee Brewers   73  89  .451  30.5\n5      Cincinnati Reds   68  94  .420  35.5\n```\n\n# Complete Documentation\n\nSo far this has provided a basic overview of what this package can do and how you can use it. For full documentation on available functions and their arguments, see the [docs](https://github.com/schorrm/pybaseball/tree/master/docs) folder. \n\n# So what can I do with this? \n\nNeed some inspiration? See some examples of classic baseball studies replicated using this package [here](https://github.com/schorrm/pybaseball/tree/master/EXAMPLES).\n\n------\n\n## Credit\n\nThis package was developed by James LeDoux and forked from his original repo.\n\nThis pacakge was inspired by Bill Petti's excellent R package [baseballr](https://github.com/billpetti/baseballr), which at the time of this package's development had no Python equivalent. Our hope is to fill that void with this package.\n\nThe Lahman data comes from [Sean Lahman's baseball database](http://www.seanlahman.com/baseball-archive/statistics/).\n\nAll other data comes from FanGraphs, Baseball Reference, the Chadwick Bureau, Retrosheet, and Baseball Savant.\n\n## Work in Progress:\n\nMoving forward, we intend to:\n\n* Implement custom metrics such as Statcast edge percentages, historical Elo ratings, wOBA, etc.\n* Retrieve data from other useful sources\n* Identify edge cases where these queries fail (please open up an issue if you find one!)\n* Add more examples\n\nInterested in contributing? There are some ideas in [contributing.md](https://github.com/schorrm/pybaseball/tree/master/contributing.md).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fschorrm%2Fpybaseball","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fschorrm%2Fpybaseball","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fschorrm%2Fpybaseball/lists"}