https://github.com/derhuerst/match-gtfs-rt-to-gtfs

Match HAFAS realtime data with GTFS Static data.
https://github.com/derhuerst/match-gtfs-rt-to-gtfs
gtfs gtfs-realtime gtfs-rt public-transport transit
Last synced: 21 days ago
JSON representation
Match HAFAS realtime data with GTFS Static data.
Host: GitHub
URL: https://github.com/derhuerst/match-gtfs-rt-to-gtfs
Owner: derhuerst
License: isc
Created: 2020-09-24T15:04:20.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2024-06-25T13:46:59.000Z (about 1 year ago)
Last Synced: 2025-04-15T05:36:52.225Z (3 months ago)
Topics: gtfs, gtfs-realtime, gtfs-rt, public-transport, transit
Language: JavaScript
Homepage:
Size: 6.43 MB
Stars: 5
Watchers: 3
Forks: 1
Open Issues: 5
Metadata Files:
- Readme: readme.md
- License: license.md
Awesome Lists containing this project

README

        # match-gtfs-rt-to-gtfs

Try to **match realtime transit data (e.g. from [GTFS Realtime (GTFS-RT)](https://gtfs.org/reference/realtime/v2/)) with [GTFS Static](https://gtfs.org/reference/static) data**, even if they don't share an ID.

[![npm version](https://img.shields.io/npm/v/match-gtfs-rt-to-gtfs.svg)](https://www.npmjs.com/package/match-gtfs-rt-to-gtfs)

![ISC-licensed](https://img.shields.io/github/license/derhuerst/match-gtfs-rt-to-gtfs.svg)

![minimum Node.js version](https://img.shields.io/node/v/match-gtfs-rt-to-gtfs.svg)

[![support me via GitHub Sponsors](https://img.shields.io/badge/support%20me-donate-fa7664.svg)](https://github.com/sponsors/derhuerst)

[![chat with me on Twitter](https://img.shields.io/badge/chat%20with%20me-on%20Twitter-1da1f2.svg)](https://twitter.com/derhuerst)

This repo uses [`@derhuerst/stable-public-transport-ids`](https://github.com/derhuerst/stable-public-transport-ids) to compute IDs from transit data itself:

1. [`gtfs-via-postgres`](https://github.com/derhuerst/gtfs-via-postgres) is used to import the GTFS Static data into the DB.

2. It computes these "stable IDs" for all relevant items in the GTFS Static data and store them in the DB.

3. When given a pice of realtime data (e.g. from a GTFS Realtime feed), compute its "stable IDs" and check if they match those stored in the DB.

## Installation

```shell

npm install match-gtfs-rt-to-gtfs

```

*Note:* `match-gtfs-rt-to-gtfs` **needs PostgreSQL >=14** to work, as its dependency [`gtfs-via-postgres`](https://github.com/derhuerst/gtfs-via-postgres) needs that version. You can check your PostgreSQL server's version with `psql -t -c 'SELECT version()'`.

## Usage

### building the database

Let's use `gtfs-to-sql` CLI from the [`gtfs-via-postgres`](https://github.com/derhuerst/gtfs-via-postgres) to import our GTFS data into [PostgreSQL](https://www.postgresql.org):

```shell

gtfs-to-sql path/to/gtfs/*.txt | psql -b

```

To some extent, `match-gtfs-rt-to-gtf` fuzzily matches stop/station & route/line names (more on that below). For that to work, we need to tell it how to "normalize" these names. As an example, we're going to do this for the [VBB](https://en.wikipedia.org/wiki/Verkehrsverbund_Berlin-Brandenburg) data:

```js

// normalize.js

import normalizeStopName from 'normalize-vbb-station-name-for-search'

import slugg from 'slugg'

const normalizeLineName = (name) => {

	return slugg(name.replace(/([a-zA-Z]+)\s+(\d+)/g, '$1$2'))

}

export {

	normalizeStopName,

	normalizeLineName,

	// With VBB vehicles, the headsign is almost always the last stop.

	normalizeStopName as normalizeTripHeadsign,

}

```

We're going to create two files that specify how to handle the GTFS-RT & GTFS (Static) data, respectively:

```js

// gtfs-rt-info.js

import {

	normalizeStopName,

	normalizeLineName,

	normalizeTripHeadsign,

} from './normalize.js'

const idNamespace = 'vbb'

const endpointName = 'vbb-hafas'

export {

	idNamespace,

	endpointName,

	normalizeStopName,

	normalizeLineName,

	normalizeTripHeadsign,

}

```

```js

// gtfs-info.js

import {

	normalizeStopName,

	normalizeLineName,

	normalizeTripHeadsign,

} from './normalize.js'

const idNamespace = 'vbb'

const endpointName = 'vbb-gtfs'

export {

	idNamespace,

	endpointName,

	normalizeStopName,

	normalizeLineName,

	normalizeTripHeadsign,

}

```

*Note:* To keep things easy, we're using the same normalization functions here. In practice, if your two data sources use different stop/line/headsign notations, you will need to use data-source-specific implementations.

Now, we're going to use `match-gtfs-rt-to-gtfs/build-index.js` to import additional data into the database that is needed for matching:

```bash

set -o pipefail

./build-index.js gtfs-rt-info.js gtfs-info.js | psql -b

```

### matching data

`match-gtfs-rt-to-gtf` does its job using fuzzy matching: As an example, it **identifies two departure data points from GTFS-RT & GTFS – at the same time, at the same stop/station and with the same route/line name – as equivalent**.

Now, let's match a departure against GTFS:

```js

import {createMatch} from 'match-gtfs-rt-to-gtfs'

import gtfsRtInfo from './gtfs-rt-info.js' // see above

import gtfsInfo from './gtfs-info.js' // see above

const gtfsRtDep = {

	tripId: '1|12308|1|86|7112020',

	direction: 'Grunewald, Roseneck',

	line: {

		type: 'line',

		id: 'm29',

		fahrtNr: '22569',

		name: 'M29',

		public: true,

		adminCode: 'BVB',

		mode: 'bus',

		product: 'bus',

		operator: {

			type: 'operator',

			id: 'berliner-verkehrsbetriebe',

			name: 'Berliner Verkehrsbetriebe'

		},

	},

	stop: {

		type: 'stop',

		id: '900000013101',

		name: 'U Moritzplatz',

		location: {latitude: 52.503737, longitude: 13.410944},

	},

	when: '2020-11-07T14:55:00+01:00',

	plannedWhen: '2020-11-07T14:54:00+01:00',

	delay: 60,

	platform: null,

	plannedPlatform: null,

}

const {matchDeparture} = createMatch(gtfsRtInfo, gtfsInfo)

console.log(await matchDeparture(gtfsRtDep))

```

```js

{

	tripId: '145341691',

	tripIds: {

		'vbb-hafas': '1|12308|1|86|7112020',

		'vbb-gtfs': '145341691',

	},

	routeId: '17449_700',

	direction: 'Grunewald, Roseneck',

	line: {

		type: 'line',

		id: null,

		fahrtNr: '22569',

		fahrtNrs: {'vbb-hafas': '22569'},

		name: 'M29',

		public: true,

		adminCode: 'BVB',

		mode: 'bus',

		product: 'bus',

		operator: {

			type: 'operator',

			id: 'berliner-verkehrsbetriebe',

			name: 'Berliner Verkehrsbetriebe'

		},

	},

	stop: {

		type: 'stop',

		id: '070101002285',

		ids: {

			'vbb-hafas': '900000013101',

			'vbb-gtfs': '070101002285',

		},

		name: 'U Moritzplatz',

		location: {latitude: 52.503737, longitude: 13.410944},

	},

	when: '2020-11-07T14:55:00+01:00',

	plannedWhen: '2020-11-07T14:54:00+01:00',

	delay: 60,

	platform: null,

	plannedPlatform: null,

}

```

### finding the shape of a trip

```js

import {findShape} from 'match-gtfs-rt-to-gtfs/find-shape.js'

const someTripId = '24582338' // some U3 trip from the HVV dataset

await findShape(someTripId)

```

`findShape` resolves with a [GeoJSON `LineString`](https://tools.ietf.org/html/rfc7946#section-3.1.4):

```js

{

	type: 'LineString',

	coordinates: [

		[10.044385, 53.5872],

		// …

		[10.074888, 53.592473]

	],

}

```

## How it works

`gtfs-via-postgres` adds a [view](https://www.postgresql.org/docs/12/sql-createview.html) `arrivals_departures`, which contains every arrival/departure of every trip in the GTFS static dataset. This repo adds another view `arrivals_departures_with_stable_ids`, which combines the data with the "stable IDs" stored in separate tables. It is then used for the matching process, which works like this conceptually:

```sql

WITH

	query_stop_station_stable_ids AS (

		SELECT *

		FROM unnest(

			ARRAY['stop_stable_id', 'stop_stable_id'],

			ARRAY['stop-id1', 'stop-id2'],

			ARRAY[20, 30]

		)

		AS t(kind, stable_id, specificity)

	),

	query_route_stable_ids AS (

		SELECT *

		FROM unnest(

			ARRAY['route-id1', 'route-id2'],

			ARRAY[21, 33]

		)

		AS t(stable_id, specificity)

	)

SELECT *

FROM arrivals_departures_with_stable_ids

JOIN query_route_stable_ids route_stable ON (

	ad.route_stable_id = route_stable.stable_id

	AND ad.route_stable_id_specificity = route_stable.specificity

)

JOIN query_stop_or_station_stable_ids stop_stable ON (

		stop_stable.kind = 'stop_stable_id'

		AND ad.stop_stable_id = stop_stable.stable_id

		AND ad.stop_stable_id_specificity = stop_stable.specificity

	) OR (

		stop_stable.kind = 'station_stable_id'

		AND ad.station_stable_id = stop_stable.stable_id

		AND ad.station_stable_id_specificity = stop_stable.specificity

	)

WHERE t_departure > '2020-10-16T22:20:48+02:00'

AND t_departure < '2020-10-16T22:22:48+02:00'

```

Because PostgreSQL executes this query quite efficiently, we don't need to store a pre-computed list index of *all* arrivals/departures, but just an index of their stable stop/station/route IDs.

The size of this additional index depends on how many stable IDs your logic generates for each stop/station/route. Consider the [2020-09-25 VBB GTFS Static feed](https://vbb-gtfs.jannisr.de/2020-09-25) as an example: Without [`shapes.txt`](https://gtfs.org/reference/static/#shapestxt), it is 356MB as CSV files, ~2GB as imported & indexed in the DB by `gtfs-via-posgres`; `match-gtfs-rt-to-gtfs`'s stable IDs indices add another

- 300MB with few stable IDs per stop/station/route, and

- 3GB with 10-30 stable IDs each.

## API

### `gtfsInfo`/`gtfsRtInfo`

```ts

{

	idNamespace: string,

	endpointName: string,

	normalizeStopName: (name: string, stop: FptfStop) => string,

	normalizeLineName(name: string, line: FptfLine) => string,

	normalizeTripHeadsign(headsign: string) => string,

}

```

## Contributing

*Note:* This repos blends two families of techinical terms – GTFS-related ones and [FPTF](https://public-transport.github.io/friendly-public-transport-format/)-/[`hafas-client`](https://github.com/public-transport/hafas-client)-related ones –, which makes the code somewhat confusing.

If you have a question or need support using `match-gtfs-rt-to-gtfs`, please double-check your code and setup first. If you think you have found a bug or want to propose a feature, use [the issues page](https://github.com/derhuerst/match-gtfs-rt-to-gtfs/issues).
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/derhuerst/match-gtfs-rt-to-gtfs

Awesome Lists containing this project

README