Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/remix/partridge
A fast, forgiving GTFS reader built on pandas DataFrames
https://github.com/remix/partridge
gtfs pandas python
Last synced: 3 months ago
JSON representation
A fast, forgiving GTFS reader built on pandas DataFrames
- Host: GitHub
- URL: https://github.com/remix/partridge
- Owner: remix
- License: mit
- Created: 2017-08-31T17:51:10.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2023-12-03T23:03:46.000Z (11 months ago)
- Last Synced: 2024-07-15T06:04:06.798Z (4 months ago)
- Topics: gtfs, pandas, python
- Language: Python
- Homepage: https://partridge.readthedocs.io
- Size: 2.96 MB
- Stars: 149
- Watchers: 10
- Forks: 22
- Open Issues: 6
-
Metadata Files:
- Readme: README.rst
- Changelog: HISTORY.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE
Awesome Lists containing this project
README
=========
Partridge
=========.. image:: https://img.shields.io/pypi/v/partridge.svg
:target: https://pypi.python.org/pypi/partridge.. image:: https://img.shields.io/travis/remix/partridge.svg
:target: https://travis-ci.org/remix/partridgePartridge is a Python 3.6+ library for working with `GTFS `__ feeds using `pandas `__ DataFrames.
Partridge is heavily influenced by our experience at `Remix `__ analyzing and debugging every GTFS feed we could find.
At the core of Partridge is a dependency graph rooted at ``trips.txt``. Disconnected data is pruned away according to this graph when reading the contents of a feed.
Feeds can also be filtered to create a view specific to your needs. It's most common to filter a feed down to specific dates (``service_id``) or routes (``route_id``), but any field can be filtered.
.. figure:: dependency-graph.png
:alt: dependency graphPhilosophy
----------The design of Partridge is guided by the following principles:
**As much as possible**
- Favor speed
- Allow for extension
- Succeed lazily on expensive paths
- Fail eagerly on inexpensive paths**As little as possible**
- Do anything other than efficiently read GTFS files into DataFrames
- Take an opinion on the GTFS specInstallation
------------.. code:: console
pip install partridge
**GeoPandas support**
.. code:: console
pip install partridge[full]
Usage
-----**Setup**
.. code:: python
import partridge as ptg
inpath = 'path/to/caltrain-2017-07-24/'
Examples
--------The following is a collection of gists containing Jupyter notebooks with transformations to GTFS feeds that may be useful for intake into software applications.
* `Find the busiest week in a feed and reduce its file size `_
* `Combine routes by route_short_name `_
* `Merge GTFS with shapefile geometries `_
* `Merge multiple agencies into one `_
* `Rewrite a feed to clean up formatting issues `_
* `If a feed has stop_code, replace the contents of stop_id with the contents of stop_code `_
* `Diff the number of service hours in two feeds `_
* `Investigate the the distance in meters of each stop to the closest point on a shape `_
* `Convert frequencies.txt to an equivalent trips.txt `_
* `Calculate headway for a stop `_Inspecting the calendar
~~~~~~~~~~~~~~~~~~~~~~~**The date with the most trips**
.. code:: python
date, service_ids = ptg.read_busiest_date(inpath)
# datetime.date(2017, 7, 17), frozenset({'CT-17JUL-Combo-Weekday-01'})**The week with the most trips**
.. code:: python
service_ids_by_date = ptg.read_busiest_week(inpath)
# {datetime.date(2017, 7, 17): frozenset({'CT-17JUL-Combo-Weekday-01'}),
# datetime.date(2017, 7, 18): frozenset({'CT-17JUL-Combo-Weekday-01'}),
# datetime.date(2017, 7, 19): frozenset({'CT-17JUL-Combo-Weekday-01'}),
# datetime.date(2017, 7, 20): frozenset({'CT-17JUL-Combo-Weekday-01'}),
# datetime.date(2017, 7, 21): frozenset({'CT-17JUL-Combo-Weekday-01'}),
# datetime.date(2017, 7, 22): frozenset({'CT-17JUL-Caltrain-Saturday-03'}),
# datetime.date(2017, 7, 23): frozenset({'CT-17JUL-Caltrain-Sunday-01'})}**Dates with active service**
.. code:: python
service_ids_by_date = ptg.read_service_ids_by_date(path)
date, service_ids = min(service_ids_by_date.items())
# datetime.date(2017, 7, 15), frozenset({'CT-17JUL-Caltrain-Saturday-03'})date, service_ids = max(service_ids_by_date.items())
# datetime.date(2019, 7, 20), frozenset({'CT-17JUL-Caltrain-Saturday-03'})**Dates with identical service**
.. code:: python
dates_by_service_ids = ptg.read_dates_by_service_ids(inpath)
busiest_date, busiest_service = ptg.read_busiest_date(inpath)
dates = dates_by_service_ids[busiest_service]min(dates), max(dates)
# datetime.date(2017, 7, 17), datetime.date(2019, 7, 19)Reading a feed
~~~~~~~~~~~~~~.. code:: python
_date, service_ids = ptg.read_busiest_date(inpath)
view = {
'trips.txt': {'service_id': service_ids},
'stops.txt': {'stop_name': 'Gilroy Caltrain'},
}feed = ptg.load_feed(path, view)
**Read shapes and stops as GeoDataFrames**
.. code:: python
service_ids = ptg.read_busiest_date(inpath)[1]
view = {'trips.txt': {'service_id': service_ids}}feed = ptg.load_geo_feed(path, view)
feed.shapes.head()
# shape_id geometry
# 0 cal_gil_sf LINESTRING (-121.5661454200744 37.003512297983...
# 1 cal_sf_gil LINESTRING (-122.3944115638733 37.776439059278...
# 2 cal_sf_sj LINESTRING (-122.3944115638733 37.776439059278...
# 3 cal_sf_tam LINESTRING (-122.3944115638733 37.776439059278...
# 4 cal_sj_sf LINESTRING (-121.9031703472137 37.330157067882...minlon, minlat, maxlon, maxlat = feed.stops.total_bounds
# -122.412076, 37.003485, -121.566088, 37.77639Extracting a new feed
~~~~~~~~~~~~~~~~~~~~~.. code:: python
outpath = 'gtfs-slim.zip'
service_ids = ptg.read_busiest_date(inpath)[1]
view = {'trips.txt': {'service_id': service_ids}}ptg.extract_feed(inpath, outpath, view)
feed = ptg.load_feed(outpath)assert service_ids == set(feed.trips.service_id)
Features
--------- Surprisingly fast :)
- Load only what you need into memory
- Built-in support for resolving service dates
- Easily extended to support fields and files outside the official spec
(TODO: document this)
- Handle nested folders and bad data in zips
- Predictable type conversionsThank You
---------I hope you find this library useful. If you have suggestions for
improving Partridge, please open an `issue on
GitHub `__.