{"id":29785783,"url":"https://github.com/benjamincrom/baseball","last_synced_at":"2025-07-27T17:14:15.361Z","repository":{"id":44546030,"uuid":"114305274","full_name":"benjamincrom/baseball","owner":"benjamincrom","description":"Library to download, analyze, and visualize events in Major League Baseball games","archived":false,"fork":false,"pushed_at":"2025-05-15T13:48:23.000Z","size":11720,"stargazers_count":88,"open_issues_count":16,"forks_count":15,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-05-15T14:45:03.510Z","etag":null,"topics":["baseball","baseball-analysis-packages","baseball-statistics","baseball-stats"],"latest_commit_sha":null,"homepage":"http://livebaseballscorecards.com","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/benjamincrom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-12-14T23:15:40.000Z","updated_at":"2025-05-15T13:49:12.000Z","dependencies_parsed_at":"2024-04-10T03:28:59.352Z","dependency_job_id":"8ad7b960-6145-4693-a7c8-ea100ee9dce4","html_url":"https://github.com/benjamincrom/baseball","commit_stats":{"total_commits":354,"total_committers":2,"mean_commits":177.0,"dds":0.4915254237288136,"last_synced_commit":"62abe206975a1a355602d423cb545a7c6f65c27e"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/benjamincrom/baseball","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benjamincrom%2Fbaseball","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benjamincrom%2Fbaseball/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benjamincrom%2Fbaseball/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benjamincrom%2Fbaseball/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/benjamincrom","download_url":"https://codeload.github.com/benjamincrom/baseball/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benjamincrom%2Fbaseball/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267392562,"owners_count":24079919,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-27T02:00:11.917Z","response_time":82,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["baseball","baseball-analysis-packages","baseball-statistics","baseball-stats"],"created_at":"2025-07-27T17:14:06.494Z","updated_at":"2025-07-27T17:14:15.354Z","avatar_url":"https://github.com/benjamincrom.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"**Table of Contents**\n\n- [Baseball](#baseball)\n    - [Installing from pypi](#installing-from-pypi)\n    - [Installing from source](#installing-from-source)\n    - [Fetch individual MLB game](#fetch-individual-mlb-game)\n    - [Game Class Structure](#game-class-structure)\n        - [Game](#game)\n        - [Team](#team)\n        - [Inning](#inning)\n        - [PlateAppearance](#plateappearance)\n        - [Player](#player)\n        - [PlayerAppearance](#playerappearance)\n        - [Pitch](#pitch)\n        - [Pickoff](#pickoff)\n        - [RunnerAdvance](#runneradvance)\n        - [Substitution](#substitution)\n        - [Switch](#switch)\n    - [Analyze a game: 2017 World Series - Game 7](#analyze-a-game-2017-world-series---game-7)\n    - [Analyze a player's season: R.A. Dickey - 2017](#analyze-a-players-season-ra-dickey---2017)\n    - [Analyze a lineup of pitchers: Atlanta Braves - 2017 Regular Season](#analyze-a-lineup-of-pitchers-atlanta-braves---2017-regular-season)\n\n# Baseball\nThis package fetches and parses event data for Major League Baseball games.  [Game](#game) objects generated via the **\\_from\\_url** methods pull data from MLB endpoints where events are published within about 30 seconds of occurring.  This [XML/JSON source data zip file](https://spaces-host.nyc3.digitaloceanspaces.com/livebaseballscorecards-artifacts/baseball_1974_2021.zip) contains event data from MLB games 1974 - 2020.\n\n## Installing from pypi\n```\npip3 install baseball\n```\n## Installing from source\n```\ngit clone git@github.com:benjamincrom/baseball.git\ncd baseball/\npython3 setup.py install\n```\n\n## Fetch individual MLB game\n* __get_game_from_url(__*date_str, away_code, home_code, game_number*__)__\n\nFetch an object which contains metadata and events for a single MLB game.\n```python\nimport baseball\ngame_id, game = baseball.get_game_from_url('2017-11-1', 'HOU', 'LAD', 1)\ngame_dict = game._asdict()\ngame_json_str = game.json()\n```\nWrite scorecard as SVG image:\n```python\nwith open(game_id + '.svg', 'w') as fh:\n    fh.write(game.get_svg_str())\n```\n2017-11-01-HOU-LAD-1.svg\n![svg](README_images/2017-11-01-HOU-LAD-1.svg)\n\n## Game Class Structure\n#### Game\n- away_batter_box_score_dict\n- away_pitcher_box_score_dict\n- away_team ([Team](#team))\n- away_team_stats\n- start_datetime\n- expected_start_datetime\n- game_date_str\n- home_batter_box_score_dict\n- home_pitcher_box_score_dict\n- home_team ([Team](#team))\n- home_team_stats\n- inning_list ([Inning](#inning) list)\n- end_datetime\n- location\n- attendance\n- weather\n- temp\n- timezone_str\n- is_postponed\n- is_suspended\n- is_doubleheader\n- is_today\n- get_svg_str()\n- json()\n- \\_asdict()\n\n#### Team\n- abbreviation\n- batting_order_list_list (list of nine [PlayerAppearance](#playerappearance) lists)\n- name\n- pitcher_list ([PlayerAppearance](#playerappearance) list)\n- player_id_dict\n- player_last_name_dict\n- player_name_dict\n- \\_asdict()\n\n#### Inning\n- bottom_half_appearance_list ([PlateAppearance](#plateappearance) list)\n- bottom_half_inning_stats\n- top_half_appearance_list ([PlateAppearance](#plateappearance) list)\n- top_half_inning_stats\n- \\_asdict()\n\n#### PlateAppearance\n- start_datetime\n- end_datetime\n- batter ([Player](#player))\n- batting_team ([Team](#team))\n- error_str\n- event_list (list of [Pitch](#pitch), [Pickoff](#pickoff), [RunnerAdvance](#runneradvance), [Substitution](#substitution), [Switch](#switch) objects)\n- got_on_base\n- hit_location\n- inning_outs\n- out_runners_list ([Player](#player) list)\n- pitcher ([Player](#player))\n- plate_appearance_description\n- plate_appearance_summary\n- runners_batted_in_list ([Player](#player) list)\n- scorecard_summary\n- scoring_runners_list ([Player](#player) list)\n- \\_asdict()\n\n#### Player\n- era\n- first_name\n- last_name\n- mlb_id\n- number\n- obp\n- slg\n- \\_asdict()\n\n#### PlayerAppearance\n- start_inning_batter_num\n- start_inning_half\n- start_inning_num\n- end_inning_batter_num\n- end_inning_half\n- end_inning_num\n- pitcher_credit_code\n- player_obj ([Player](#player))\n- position\n- \\_asdict()\n\n#### Pitch\n- pitch_datetime\n- pitch_description\n- pitch_position\n- pitch_speed\n- pitch_type\n- \\_asdict()\n\n#### Pickoff\n- pickoff_description\n- pickoff_base\n- pickoff_was_successful\n- \\_asdict()\n\n#### RunnerAdvance\n- runner_advance_datetime\n- run_description\n- runner ([Player](#player))\n- start_base\n- end_base\n- runner_scored\n- run_earned\n- is_rbi\n- \\_asdict()\n\n#### Substitution\n- substitution_datetime\n- incoming_player ([Player](#player))\n- outgoing_player ([Player](#player))\n- batting_order\n- position\n- \\_asdict()\n\n#### Switch\n- switch_datetime\n- player ([Player](#player))\n- old_position_num\n- new_position_num\n- new_batting_order\n- \\_asdict()\n\n## Analyze a game: 2017 World Series - Game 7\n\n\n```python\nimport matplotlib\nimport matplotlib.pyplot as plt\nimport pandas as pd\n\nimport baseball\n\n%matplotlib inline\n\ngame_id, game = baseball.get_game_from_url('11-1-2017', 'HOU', 'LAD', 1)\n\npitch_tuple_list = []\nfor inning in game.inning_list:\n    for appearance in inning.top_half_appearance_list:\n        for event in appearance.event_list:\n            if isinstance(event, baseball.Pitch):\n                pitch_tuple_list.append(\n                    (str(appearance.pitcher), \n                     event.pitch_description,\n                     event.pitch_position,\n                     event.pitch_speed,\n                     event.pitch_type)\n                )\n\ndata = pd.DataFrame(data=pitch_tuple_list, columns=['Pitcher', 'Pitch Description', 'Pitch Coordinate', 'Pitch Speed', 'Pitch Type'])\ndata.head()\n```\n\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003ePitcher\u003c/th\u003e\n      \u003cth\u003ePitch Description\u003c/th\u003e\n      \u003cth\u003ePitch Coordinate\u003c/th\u003e\n      \u003cth\u003ePitch Speed\u003c/th\u003e\n      \u003cth\u003ePitch Type\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth\u003e0\u003c/th\u003e\n      \u003ctd\u003e21 Yu Darvish\u003c/td\u003e\n      \u003ctd\u003eBall\u003c/td\u003e\n      \u003ctd\u003e(155.47, 160.83)\u003c/td\u003e\n      \u003ctd\u003e96.0\u003c/td\u003e\n      \u003ctd\u003eFF\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e1\u003c/th\u003e\n      \u003ctd\u003e21 Yu Darvish\u003c/td\u003e\n      \u003ctd\u003eCalled Strike\u003c/td\u003e\n      \u003ctd\u003e(107.0, 171.09)\u003c/td\u003e\n      \u003ctd\u003e83.9\u003c/td\u003e\n      \u003ctd\u003eFC\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e2\u003c/th\u003e\n      \u003ctd\u003e21 Yu Darvish\u003c/td\u003e\n      \u003ctd\u003eIn play, no out\u003c/td\u003e\n      \u003ctd\u003e(115.36, 183.1)\u003c/td\u003e\n      \u003ctd\u003e83.9\u003c/td\u003e\n      \u003ctd\u003eSL\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e3\u003c/th\u003e\n      \u003ctd\u003e21 Yu Darvish\u003c/td\u003e\n      \u003ctd\u003eIn play, run(s)\u003c/td\u003e\n      \u003ctd\u003e(80.06, 168.03)\u003c/td\u003e\n      \u003ctd\u003e96.6\u003c/td\u003e\n      \u003ctd\u003eFF\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e4\u003c/th\u003e\n      \u003ctd\u003e21 Yu Darvish\u003c/td\u003e\n      \u003ctd\u003eBall\u003c/td\u003e\n      \u003ctd\u003e(54.1, 216.52)\u003c/td\u003e\n      \u003ctd\u003e84.6\u003c/td\u003e\n      \u003ctd\u003eSL\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\n```python\ndata['Pitcher'].value_counts().plot.bar()\n```\n\n![png](README_images/baseball_stats_2_1.png)\n\n```python\nfor pitcher in data['Pitcher'].unique():\n    plt.ylim(0, 125)\n    plt.xlim(0, 250)\n    bx = [250 - x[2][0] for x in pitch_tuple_list if x[0] == pitcher if 'Ball' in x[1]]\n    by = [250 - x[2][1] for x in pitch_tuple_list if x[0] == pitcher if 'Ball' in x[1]]\n    cx = [250 - x[2][0] for x in pitch_tuple_list if x[0] == pitcher if 'Called Strike' in x[1]]\n    cy = [250 - x[2][1] for x in pitch_tuple_list if x[0] == pitcher if 'Called Strike' in x[1]]\n    ox = [250 - x[2][0] for x in pitch_tuple_list if x[0] == pitcher if ('Ball' not in x[1] and 'Called Strike' not in x[1])]\n    oy = [250 - x[2][1] for x in pitch_tuple_list if x[0] == pitcher if ('Ball' not in x[1] and 'Called Strike' not in x[1])]\n    b = plt.scatter(bx, by, c='b')\n    c = plt.scatter(cx, cy, c='r')\n    o = plt.scatter(ox, oy, c='g')\n\n    plt.legend((b, c, o),\n               ('Ball', 'Called Strike', 'Other'),\n               scatterpoints=1,\n               loc='upper right',\n               ncol=1,\n               fontsize=8)\n\n    plt.title(pitcher)\n    plt.show()\n```\n\n\n![png](README_images/baseball_stats_3_0.png)\n\n\n\n![png](README_images/baseball_stats_3_1.png)\n\n\n\n![png](README_images/baseball_stats_3_2.png)\n\n\n\n![png](README_images/baseball_stats_3_3.png)\n\n\n\n![png](README_images/baseball_stats_3_4.png)\n\n\n\n```python\nplt.axis('equal')\ndata['Pitch Description'].value_counts().plot(kind='pie', radius=1.5, autopct='%1.0f%%', pctdistance=1.1, labeldistance=1.2)\n```\n\n\n\n![png](README_images/baseball_stats_5_1.png)\n\n\n\n```python\ndata.plot.kde()\n```\n\n\n\n\n![png](README_images/baseball_stats_6_1.png)\n\n\n\n```python\nfig, ax = plt.subplots()\nax.set_xlim(50, 120)\nfor pitcher in data['Pitcher'].unique():\n    s = data[data['Pitcher'] == pitcher]['Pitch Speed']\n    s.plot.kde(ax=ax, label=pitcher)\n\nax.legend()\n```\n\n\n\n\n\n\n![png](README_images/baseball_stats_7_1.png)\n\n\n\n```python\nfig, ax = plt.subplots()\nax.set_xlim(50, 120)\nfor desc in data['Pitch Type'].unique():\n    s = data[data['Pitch Type'] == desc]['Pitch Speed']\n    s.plot.kde(ax=ax, label=desc)\n\nax.legend()\n```\n\n\n\n\n\n\n![png](README_images/baseball_stats_8_1.png)\n\n\n\n```python\nfig, ax = plt.subplots(figsize=(15,7))\ndata.groupby(['Pitcher', 'Pitch Description']).size().unstack().plot.bar(ax=ax)\n```\n\n\n\n\n![png](README_images/baseball_stats_9_1.png)\n\n\n\n## Analyze a player's season: R.A. Dickey - 2017\n\n\n```python\ngame_list_2017 = baseball.get_game_list_from_file_range('1-1-2017', '12-31-2017', '/Users/benjamincrom/repos/livebaseballscorecards-artifacts/baseball_files')\n\npitch_tuple_list_2 = []\nfor game_id, game in game_list_2017:\n    if game.home_team.name == 'Atlanta Braves' or game.away_team.name == 'Atlanta Braves':\n        for inning in game.inning_list:\n            for appearance in (inning.top_half_appearance_list +\n                               (inning.bottom_half_appearance_list or [])):\n                if 'Dickey' in str(appearance.pitcher):\n                    for event in appearance.event_list:\n                        if isinstance(event, baseball.Pitch):\n                            pitch_tuple_list_2.append(\n                                (str(appearance.pitcher), \n                                 event.pitch_description,\n                                 event.pitch_position,\n                                 event.pitch_speed,\n                                 event.pitch_type)\n                            )\n\ndf = pd.DataFrame(data=pitch_tuple_list_2, columns=['Pitcher', 'Pitch Description', 'Pitch Coordinate', 'Pitch Speed', 'Pitch Type'])\ndf['Pitch Type'].value_counts().plot.bar()\n```\n\n\n\n![png](README_images/baseball_stats_14_1.png)\n\n\n\n```python\nplt.axis('equal')\ndf['Pitch Description'].value_counts().plot(kind='pie', radius=2, autopct='%1.0f%%', pctdistance=1.1, labeldistance=1.2)\nplt.ylabel('')\nplt.show()\n```\n\n\n![png](README_images/baseball_stats_15_0.png)\n\n\n\n```python\ndf.dropna(inplace=True)\nax.set_xlim(50, 100)\ndf.plot.kde()\nax.legend()\n```\n\n\n\n\n\n![png](README_images/baseball_stats_16_1.png)\n\n\n\n```python\nfig, ax = plt.subplots()\nax.set_xlim(50, 100)\nfor desc in df['Pitch Type'].unique():\n    if desc != 'PO':\n        s = df[df['Pitch Type'] == desc]['Pitch Speed']\n        s.plot.kde(ax=ax, label=desc)\n\nax.legend()\n```\n\n\n\n\n\n\n![png](README_images/baseball_stats_17_1.png)\n\n\n## Analyze a lineup of pitchers: Atlanta Braves - 2017 Regular Season\n\n\n```python\nimport datetime\nimport dateutil.parser\nimport pytz\npitch_tuple_list_3 = []\nfor game_id, game in game_list_2017:\n    if game.home_team.name == 'Atlanta Braves' and dateutil.parser.parse(game.game_date_str) \u003e datetime.datetime(2017, 3, 31):\n        for inning in game.inning_list:\n            for appearance in inning.top_half_appearance_list:\n                pitch_tuple_list_3.append(\n                    (str(appearance.pitcher),\n                     str(appearance.batter),\n                     len(appearance.out_runners_list),\n                     len(appearance.scoring_runners_list),\n                     len(appearance.runners_batted_in_list),\n                     appearance.scorecard_summary,\n                     appearance.got_on_base,\n                     appearance.plate_appearance_summary,\n                     appearance.plate_appearance_description,\n                     appearance.error_str,\n                     appearance.inning_outs)\n                )\n    if game.away_team.name == 'Atlanta Braves' and dateutil.parser.parse(game.game_date_str) \u003e datetime.datetime(2017, 3, 31):\n        for inning in game.inning_list:\n            if inning.bottom_half_appearance_list:\n                for appearance in inning.bottom_half_appearance_list:\n                    pitch_tuple_list_3.append(\n                        (str(appearance.pitcher),\n                         str(appearance.batter),\n                         len(appearance.out_runners_list),\n                         len(appearance.scoring_runners_list),\n                         len(appearance.runners_batted_in_list),\n                         appearance.scorecard_summary,\n                         appearance.got_on_base,\n                         appearance.plate_appearance_summary,\n                         appearance.plate_appearance_description,\n                         appearance.error_str,\n                         appearance.inning_outs)\n                    )\n\ndf3 = pd.DataFrame(data=pitch_tuple_list_3, columns=['Pitcher',\n                                                     'Batter',\n                                                     'Out Runners',\n                                                     'Scoring Runners',\n                                                     'RBIs',\n                                                     'Scorecard',\n                                                     'On-base?',\n                                                     'Plate Summary',\n                                                     'Plate Description',\n                                                     'Error',\n                                                     'Inning Outs'])\n\nfor pitcher in df3['Pitcher'].unique():\n    summary = df3[df3['Pitcher'] == pitcher]['Plate Summary']\n    s = summary.value_counts(sort=False)\n    if len(summary) \u003e 400:\n        fig, ax = plt.subplots()\n        ax.set_ylim(0, 250)\n        s.plot.bar()\n        plt.title(pitcher)\n        plt.show()\n\n```\n\n\n![png](README_images/baseball_stats_20_0.png)\n\n\n\n![png](README_images/baseball_stats_20_1.png)\n\n\n\n![png](README_images/baseball_stats_20_2.png)\n\n\n\n![png](README_images/baseball_stats_20_3.png)\n\n\n\n![png](README_images/baseball_stats_20_4.png)\n\n\n\n```python\nx = []\nfor pitcher in df3['Pitcher'].unique():\n    #f = df3[df3['Pitcher'] == pitcher]['On-base?'].value_counts()[0]\n    s = df3[df3['Pitcher'] == pitcher]['On-base?'].value_counts()\n    if len(s) == 2:\n        f = s[0]\n        t = s[1]\n        x.append((str(pitcher), f, t))\n\ndf4 = pd.DataFrame(data=x, columns=['Pitcher',\n                                    'Did not get on base',\n                                    'Got on base'])\n\ndf4.index = df4['Pitcher']\ndf4.sort_values(by=['Got on base']).nlargest(10, 'Did not get on base').plot.bar()\n```\n\n\n\n![png](README_images/baseball_stats_22_1.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenjamincrom%2Fbaseball","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenjamincrom%2Fbaseball","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenjamincrom%2Fbaseball/lists"}