Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bdilday/pychadwick
Python package to interface with chadwick library
https://github.com/bdilday/pychadwick
baseball baseball-data c python retrosheet retrosheet-tools
Last synced: about 1 month ago
JSON representation
Python package to interface with chadwick library
- Host: GitHub
- URL: https://github.com/bdilday/pychadwick
- Owner: bdilday
- License: gpl-2.0
- Created: 2019-08-27T16:37:54.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2023-07-06T12:40:03.000Z (over 1 year ago)
- Last Synced: 2024-11-06T16:18:14.853Z (about 2 months ago)
- Topics: baseball, baseball-data, c, python, retrosheet, retrosheet-tools
- Language: C
- Homepage:
- Size: 341 KB
- Stars: 9
- Watchers: 5
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pychadwick
A Python package to interface with the `Chadwick` libray.
`Chadwick` is a set of tools for parsing retrosheet data
and is available athttp://chadwick.sourceforge.net/doc/index.html
https://github.com/chadwickbureau/chadwick
## Features
As of now this package supports retrosheet event data only.
## Installation
```bash
$ pip install pychadwick
```## Example use
#### Python replacement for cwevent
When you install `pychadwick`, it will install a
Python exe that mimic the `cwevent` exe from the
chadwick project. It reads a set of event files and
prints them out in csv format to `stdout`.This downloads a fresh copy of the retrosheet event files,
and parses them with 7 CPUs```bash
$ time pycwevent -n 7 > /tmp/events1.csv
stderr: data_root not given as argument, downloading fresh copy of retrosheet events...
stderr: found 2254 files
Warning: Invalid integer value 'b'real 3m14.517s
user 12m18.104s
sys 0m25.264s$ wc -l /tmp/events1.csv
13976191 /tmp/events1.csv
```This uses a pre-downloaded copy of the retrosheet event files,
with 7 CPUs```bash
$ time pycwevent -n 7 --data-root /tmp/retrosheet-master/event/regular/ > /tmp/events2.csv
stderr: found 2254 files
Warning: Invalid integer value 'b'real 1m57.499s
user 9m52.236s
sys 0m17.672s$ wc -l /tmp/events2.csv
13976184 /tmp/events2.csv
```### Python interface to cwevent
#### Load events
Load events for a game from a file stored on the web
```python
>>> from pychadwick.chadwick import Chadwick
>>> chadwick = Chadwick()
>>> file_path = "https://raw.githubusercontent.com/chadwickbureau/retrosheet/master/event/regular/1982OAK.EVA"
>>> games = chadwick.games(file_path)
>>> game = next(games)
>>> df = chadwick.game_to_dataframe(game)
>>> df
GAME_ID AWAY_TEAM_ID INN_CT BAT_HOME_ID ... ASS9_FLD_CD ASS10_FLD_CD UNKNOWN_OUT_EXC_FL UNCERTAIN_PLAY_EXC_FL
0 OAK198204060 CAL 1 0 ... 0 0 F F
1 OAK198204060 CAL 1 0 ... 0 0 F F
2 OAK198204060 CAL 1 0 ... 0 0 F F
3 OAK198204060 CAL 1 1 ... 0 0 F F
4 OAK198204060 CAL 1 1 ... 0 0 F F
.. ... ... ... ... ... ... ... ... ...
81 OAK198204060 CAL 11 1 ... 0 0 F F
82 OAK198204060 CAL 11 1 ... 0 0 F F
83 OAK198204060 CAL 11 1 ... 0 0 F F
84 OAK198204060 CAL 11 1 ... 0 0 F F
85 OAK198204060 CAL 11 1 ... 0 0 F F[86 rows x 159 columns]
```Load events for a game from a local file
```python
>>> file_path = " /tmp/retrosheet-master/event/regular/1982OAK.EVA"
>>> games = chadwick.games(file_path)
>>> game = next(games)
>>> df = chadwick.game_to_dataframe(game)
>>> df
GAME_ID AWAY_TEAM_ID INN_CT BAT_HOME_ID ... ASS9_FLD_CD ASS10_FLD_CD UNKNOWN_OUT_EXC_FL UNCERTAIN_PLAY_EXC_FL
0 OAK198204060 CAL 1 0 ... 0 0 F F
1 OAK198204060 CAL 1 0 ... 0 0 F F
2 OAK198204060 CAL 1 0 ... 0 0 F F
3 OAK198204060 CAL 1 1 ... 0 0 F F
4 OAK198204060 CAL 1 1 ... 0 0 F F
.. ... ... ... ... ... ... ... ... ...
81 OAK198204060 CAL 11 1 ... 0 0 F F
82 OAK198204060 CAL 11 1 ... 0 0 F F
83 OAK198204060 CAL 11 1 ... 0 0 F F
84 OAK198204060 CAL 11 1 ... 0 0 F F
85 OAK198204060 CAL 11 1 ... 0 0 F F[86 rows x 159 columns]
```Check which columns are defined
```python
>>> chadwick.all_headers
```Check which columns are enabled
```python
>>> chadwick.active_headers
```Disable all columns, and add only `GAME_ID` and `BAT_ID`
```python
>>> _ = [chadwick.unset_event_field(e) for e in chadwick.all_headers]>>> chadwick.active_headers
[]>>> chadwick.set_event_field("GAME_ID")
>>> chadwick.set_event_field("BAT_ID")
>>> games = chadwick.games(file_path)
>>> game = next(games)
>>> df = chadwick.game_to_dataframe(game)
>>> df
GAME_ID BAT_ID
0 OAK198204060 burlr001
1 OAK198204060 lynnf001
2 OAK198204060 carer001
3 OAK198204060 hendr001
4 OAK198204060 murpd002
.. ... ...
81 OAK198204060 meyed001
82 OAK198204060 armat001
83 OAK198204060 grosw001
84 OAK198204060 spenj101
85 OAK198204060 loped001[86 rows x 2 columns]
```Activate all the columns again
```python
>>> chadwick.set_all_headers()
```