{"id":28545598,"url":"https://github.com/crate/devrel-gtfs-transit","last_synced_at":"2026-03-03T12:38:06.542Z","repository":{"id":277536446,"uuid":"932698383","full_name":"crate/devrel-gtfs-transit","owner":"crate","description":"Capture GTFS and GTFS-RT data for storage and analysis with CrateDB.","archived":false,"fork":false,"pushed_at":"2025-03-17T23:13:40.000Z","size":4950,"stargazers_count":1,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-17T23:31:04.678Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://cratedb.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/crate.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-14T11:00:45.000Z","updated_at":"2025-03-17T23:13:43.000Z","dependencies_parsed_at":"2025-02-14T13:32:32.640Z","dependency_job_id":"66f91782-9781-4a86-bce5-302566e44f99","html_url":"https://github.com/crate/devrel-gtfs-transit","commit_stats":null,"previous_names":["crate/devrel-gtfs-transit"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crate%2Fdevrel-gtfs-transit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crate%2Fdevrel-gtfs-transit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crate%2Fdevrel-gtfs-transit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crate%2Fdevrel-gtfs-transit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/crate","download_url":"https://codeload.github.com/crate/devrel-gtfs-transit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crate%2Fdevrel-gtfs-transit/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":258971327,"owners_count":22786066,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-09T23:08:11.858Z","updated_at":"2026-03-03T12:38:06.480Z","avatar_url":"https://github.com/crate.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CrateDB GTFS / GTFS-RT Transit Data Demo\n\n## Introduction\n\nThis is a demo application that has a Python back end and JavaScript / Leaflet maps front end.  It uses GTFS ([General Transit Feed Specification](https://gtfs.org/)) and GTFS-RT (the extra [realtime feeds for GTFS](https://gtfs.org/documentation/realtime/reference/)) to store and analyze transit system route, trip, stop and vehicle movement data in [CrateDB](https://cratedb.com).\n\nGTFS and GRTFS-RT are standard ways of representing this type of data.  This means that, in theory, this project could be applicable to any transit system that adopts this approach.  However, there can be differences between transit agencies, so some aspects of the project may need adapting for that.  \n\nWe have developed this demo using GTFS and GTFS-RT data from the [Washington Metropolitan Area Transit Authority](https://www.wmata.com/about/developers/) (WMATA), specifically for the DC Metro train system.  The design of the database schema allows for data from multiple agencies / transit systems to be stored as long as each agency has a unique agency ID.\n\nHere's a sped up demo of the front end running, showing train movements on the DC Metro system:\n\n![Demo showing front end running](gtfs_demo_front_end_sped_up.gif)\n\nIndividual trains can be tracked by clicking on them, which displays information about the train's current trip in a popup:\n\n![Demo showing details of a single train trip](gtfs_demo_front_end.png)\n\n## Prerequisites\n\nTo run this project you'll need to install the following software:\n\n* Python 3 ([download](https://www.python.org/downloads/)) - we've tested this project with Python 3.12.2 on macOS Sequoia.\n* Git command line tools ([download](https://git-scm.com/downloads)).\n* Your favorite code editor, to edit configuration files and browse/edit the code if you wish.  Visual Studio Code is great for this.\n* Access to a cloud or local CrateDB cluster (see below for details).\n* A WMATA API key.  These are free, and you can register for API access and get your key at the [WMATA developer portal](https://developer.wmata.com/).\n\n## Getting the Code\n\nNext you'll need to get a copy of the code from GitHub by cloning the repository. Open up your terminal and change directory to wherever you store coding projects, then enter the following commands:\n\n```bash\ngit clone https://github.com/crate/devrel-gtfs-transit.git\ncd devrel-gtfs-transit\n```\n\n## Getting a CrateDB Database\n\nYou'll need a CrateDB database to store the project's data in.  Choose between a free hosted instance in the cloud, or run the database locally.  Either option is fine.\n\n### Cloud Option\n\nCreate a database in the cloud by first pointing your browser at [`console.cratedb.cloud`](https://console.cratedb.cloud/).\n\nLogin or create an account, then follow the prompts to create a \"CRFREE\" database on shared infrastructure in the cloud of your choice (choose from Amazon AWS, Microsoft Azure and Google Cloud).  Pick a region close to where you live to minimize latency between your machine running the code and the database that stores the data. \n\nOnce you've created your cluster, you'll see a \"Download\" button.  This downloads a text file containing a copy of your database hostname, port, username and password.  Make sure to download these as you'll need them later and won't see them again.  Your credentials will look something like this example (exact values will vary based on your choice of AWS/Google Cloud/Azure etc):\n\n```\nHost:              some-host-name.gke1.us-central1.gcp.cratedb.net\nPort (PostgreSQL): 5432\nPort (HTTPS):      4200\nDatabase:          crate\nUsername:          admin\nPassword:          the-password-will-be-here\n```\n\nWait until the cluster status shows a green status icon and \"Healthy\" status before continuing.  Note that it may take a few moments to provision your database.\n\n### Local Option\n\nThe best way to run CrateDB locally is by using Docker.  We've provided a Docker Compose file for you.  Once you've installed [Docker Desktop](https://www.docker.com/products/docker-desktop/), you can start the database like this:\n\n```bash\ndocker compose up\n```\n\nOnce the database is up and running, you can access the console by pointing your browser at:\n\n```\nhttp://localhost:4200\n```\n\nNote that if you have something else running on port 4200 (CrateDB admin UI) or port 5432 (Postgres protocol port) you'll need to stop those other services first, or edit the Docker compose file to expose these ports at different numbers on your local machine.\n\n## Creating the Database Tables\n\nWe've provided a Python data loader script that will create the database tables in CrateDB for you.\n\nYou'll first need to create a virtual environment for the data loader and configure it:\n\n```bash\ncd gtfs-static\npython -m venv venv\n. ./venv/bin/activate\npip install -r requirements.txt\n```\n\nNow make a copy of the example environment file provided:\n\n```bash\ncp env.example .env\n```\n\nEdit the `.env` file, changing the value of `CRATEDB_URL` to be the connection URL for your CrateDB database.\n\nIf you're running CrateDB locally (for example with the provided Docker Compose file) there's nothing to change here.\n\nIf you're running CrateDB in the cloud, change the connection URL as follows, using the values for your cloud cluster instance:\n\n```\nhttps://admin:\u003cpassword\u003e@\u003chostname\u003e:4200\n```\n\nSave your changes.\n\nNext, run the data loader to create the tables used by this project:\n\n```bash\npython dataloader.py createtables\n```\n\nYou should see output similar to this:\n\n```\nCreated agencies table if needed.\nCreated networks table if needed.\nCreated routes table if needed.\nCreated vehicle positions table if needed.\nCreated trip updates table if needed.\nCreated trips table if needed.\nCreated stops table if needed.\nCreated stop_times table if needed.\nCreated config table if needed.\nFinished creating any necessary tables.\n```\n\nUse the CrateDB console to verify that the above named tables were all created in the `doc` schema.\n\n## Load the Static Data\n\nThe next step is to load static data about the transport network into the database.  We'll use Washington DC (WMATA) as an example. \n\nFirst, load the configuration data for the agency:\n\n```bash\npython dataloader.py config-files/wmata.json\n```\n\nNow, load data into the `agencies` table:\n\n```bash\npython dataloader.py data-files/wmata/agency.txt\n```\n\nNext, populate the `routes` table:\n\n```bash\npython dataloader.py data-files/wmata/routes.txt\n```\n\nThen the stops table.  Here, `1` is the agency ID, and must match the spelling and capitalization of the agency ID in `agency.txt`:\n\n```bash\npython dataloader.py data/files/wmata/stops.txt 1\n```\n\nFinally, insert data into the `networks` table.  Here `WMATA` is the agency name, and must match the spelling and capitalization of the agency name in `agency.txt`:\n\n```bash\npython dataloader.py geojson/wmata/wmata.geojson WMATA\n```\n\n## Start the Front End Flask Application\n\nThis project has a web front end and a [Flask](https://flask.palletsprojects.com/) application server.  The front end is written in vanilla JavaScript and uses the [Bulma](https://bulma.io/) framework for the majority of the styling. [Leaflet](https://leafletjs.com/) is used to render maps and handle map events.  The Flask application uses the [CrateDB Python driver](https://cratedb.com/docs/python/en/latest/index.html) to talk to the database.\n\nBefore starting the front end Flask application, you'll need to create a virtual environment and configure it:\n\n```bash\ncd front-end\npython -m venv venv\n. ./venv/bin/activate\npip install -r requirements.txt\n```\n\nNow make a copy of the example environment file provided:\n\n```bash\ncp env.example .env\n```\n\nEdit the `.env` file, changing the value of `CRATEDB_URL` to be the connection URL for your CrateDB database.\n\nIf you're running CrateDB locally (for example with the provided Docker Compose file) there's nothing to change here.\n\nIf you're running CrateDB in the cloud, change the connection URL as follows, using the values for your cloud cluster instance:\n\n```\nhttps://admin:\u003cpassword\u003e@\u003chostname\u003e:4200\n```\n\nNow, edit the values of `GTFS_AGENCY_NAME` and `GTFS_AGENCY_ID` to contain the agency name and ID for the agency you're using.  These should match the values returned by this query:\n\n```sql\nSELECT agency_name, agency_id FROM agencies\n```\n\nFor example, for Washington DC / WMATA, the correct settings are:\n\n```\nGTFS_AGENCY_NAME=WMATA\nGTFS_AGENCY_ID=1\n```\n\nDon't forget that if either value contains a space, you'll need to surround the entire value with quotation marks.\n\nSave your changes.\n\nNow, start the front end application:\n\n```bash\npython app.py\n```\n\nUsing your browser, visit `http://localhost:8000` to view the map front end interface.  \n\nAt this point you should see the route map for the agency that you're working with, along with the stations / stops on the routes.  Clicking a station or stop should show information about it.\n\nNo vehicles will be visible on the map yet.  To see these, you'll need to run the real time data receiver components (see below).  \n\nWhen you're finished with the real time data receiver, stop it with `Ctrl-C` (but keep it running for now, so you'll be able to see the real time data soon...)\n\n## Start the Real Time Data Receiver Components\n\nThe real time data receivers are responsible for reading real time vehicle location and other data from the transit agencies and saving it in the database.\n\nFirst, create a virtual environment and install the dependencies:\n\n```bash\ncd front-end\npython -m venv venv\n. ./venv/bin/activate\npip install -r requirements.txt\n```\n\nNow make a copy of the example environment file provided:\n\n```bash\ncp env.example .env\n```\n\nEdit the `.env` file, changing the value of `CRATEDB_URL` to be the connection URL for your CrateDB database.\n\nIf you're running CrateDB locally (for example with the provided Docker Compose file) there's nothing to change here.\n\nIf you're running CrateDB in the cloud, change the connection URL as follows, using the values for your cloud cluster instance:\n\n```\nhttps://admin:\u003cpassword\u003e@\u003chostname\u003e:4200\n```\n\nNow, edit the value of `GTFS_AGENCY_ID` to contain the ID for the agency you're using.  It should match the value returned by this query:\n\n```sql\nSELECT agency_id FROM agencies\n```\n\nFor example, for Washington DC / WMATA, the correct setting is:\n\n```\nGTFS_AGENCY_ID=1\n```\n\nSet the value of `SLEEP_INTERVAL` to be the number of seconds that the component sleeps between checking the transit agency for updates.  This defaults to `1`, but you may need to set a longer interval if the agency you're using implements rate limiting on its API endpoints.\n\nNext, set the value of `GTFS_POSITIONS_FEED_URL` to the realtime vehicle movements endpoint URL for your agency.  For example for Washington DC / WMATA this is `https://api.wmata.com/gtfs/rail-gtfsrt-vehiclepositions.pb`.\n\nSet the value of `GTFS_TRIPS_FEED_URL` to the realtime trip updates endpoint URL for your agency. For example for Washington DC / WMATA this is `https://api.wmata.com/gtfs/rail-gtfsrt-tripupdates.pb`.\n\nSet the value of `GTFS_TRIPS_SCHEDULE_URL` to the static GTFS URL for your agency.  This will be a URL that serves a zip file.  For example for Washington DC / WMATA this is `https://api.wmata.com/gtfs/rail-gtfs-static.zip`.\n\nFinally, if your agency requires an API key to access realtime data, set the values of `GTFS_POSITIONS_FEED_KEY`, `GTFS_TRIPS_FEED_KEY` and `GTFS_TRIPS_SCHEDULE_KEY` appropriately.  You'll most likely use the same API key for each.\n\nSave your changes.\n\nThe schedule of trips is stored in two tables in CrateDB: `trips` and `stop_times`.  You need to update this **once daily** by running:\n\n```bash\npython trip_schedule.py 1\n```\n\nStart gathering real time vehicle position data continuously by running this command:\n\n```bash\npython vehicle_positions.py\n```\n\nYou should also start continuous gathering of real time trip update data by running:\n\n```bash\npython trip_updates.py\n```\n\nWhen you're finished with the real time data receivers, stop them with `Ctrl-C`.\n\nAssuming that the Flask front end web application is running, you should now see vehicle movement details at `http://localhost:8000`.  Clicking a vehicle should display a pop up with information about the trip that the vehicle is currently on: trip ID, next stops, time estimates etc.\n\n## Analyzing the Data\n\nOnce the system's been running for a while, you might want to run some queries that analyze and aggregate data.  We've provided some examples in the [`example_queries.md`](example_queries.md) file.\n\n## Work in Progress Notes Below\n\nGetting GeoJSON from GTFS:\n\nhttps://github.com/BlinkTagInc/gtfs-to-geojson\n\n```bash\ncd gtfs-static\ngtfs-to-geojson --configPath ./config_wmata.json\n```\n\nGetting GTFS static data for WMATA rail:\n\n```bash\nwget --header=\"api_key: \u003cREDACTED\u003e\" https://api.wmata.com/gtfs/rail-gtfs-static.zip\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrate%2Fdevrel-gtfs-transit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcrate%2Fdevrel-gtfs-transit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrate%2Fdevrel-gtfs-transit/lists"}