{"id":33044374,"url":"https://github.com/4rterius/cgtfs","last_synced_at":"2025-11-16T20:01:46.715Z","repository":{"id":216667610,"uuid":"157017267","full_name":"4rterius/cgtfs","owner":"4rterius","description":"C library to read GTFS feeds","archived":false,"fork":false,"pushed_at":"2019-08-22T21:29:43.000Z","size":3006,"stargazers_count":8,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-06-25T05:05:28.554Z","etag":null,"topics":["c","csv","gtfs","public-transport","transit"],"latest_commit_sha":null,"homepage":"https://rakhack.github.io/cgtfs/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/4rterius.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-10T20:08:36.000Z","updated_at":"2024-06-25T05:05:30.457Z","dependencies_parsed_at":null,"dependency_job_id":"7d37a3ed-8372-44f1-9ec2-3ce5c5369c9e","html_url":"https://github.com/4rterius/cgtfs","commit_stats":null,"previous_names":["4rterius/cgtfs","rakhack/cgtfs"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/4rterius/cgtfs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4rterius%2Fcgtfs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4rterius%2Fcgtfs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4rterius%2Fcgtfs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4rterius%2Fcgtfs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/4rterius","download_url":"https://codeload.github.com/4rterius/cgtfs/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4rterius%2Fcgtfs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":284767947,"owners_count":27060132,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-16T02:00:05.974Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","csv","gtfs","public-transport","transit"],"created_at":"2025-11-14T00:00:28.413Z","updated_at":"2025-11-16T20:01:46.707Z","avatar_url":"https://github.com/4rterius.png","language":"C","funding_links":[],"categories":["Producing Data","Uncategorized"],"sub_categories":["GTFS","Uncategorized"],"readme":"# CGTFS - a C library to read static GTFS feeds\n\n![Release version plate](https://img.shields.io/github/release/rakhack/cgtfs.svg)\n[![Build Status](https://travis-ci.com/rakhack/cgtfs.svg?branch=master)](https://travis-ci.com/rakhack/cgtfs)\n[![Build status](https://ci.appveyor.com/api/projects/status/etnrf1pg2g0a20jn/branch/master?svg=true)](https://ci.appveyor.com/project/rakhack/cgtfs/branch/master)\n![License: MIT plate](https://img.shields.io/github/license/rakhack/cgtfs.svg)\n\nA thin and fast low-level library which reads GTFS static feeds. This library provides a readable and intuitive C interface for parsing data provided in the Google's [General Transit Feed Specification](https://developers.google.com/transit/gtfs/) format.\n\n**Docs: [available online](https://rakhack.github.io/cgtfs/doxygen/html/index.html).**\n\nThe scope of this library's functionality is illustrated by the following figure.\n\n\u003cdiv\u003e\n  \u003cimg width=\"500px\" alt=\"Ways to parse/tranport data with CGTFS (scheme)\" src=\"/docs/doxygen/static/df_straight.svg?raw=true\u0026sanitize=true\" /\u003e\n\u003c/div\u003e\n\n## Table of contents\n\n- [CGTFS - a C library to read static GTFS feeds](#cgtfs---a-c-library-to-read-static-gtfs-feeds)\n  - [Table of contents](#table-of-contents)\n  - [Examples](#examples)\n  - [Build process and dependencies](#build-process-and-dependencies)\n    - [Dependencies](#dependencies)\n    - [Build process](#build-process)\n      - [Linux](#linux)\n      - [Windows](#windows)\n  - [Documentation](#documentation)\n    - [Terms](#terms)\n    - [API overview](#api-overview)\n      - [Principles](#principles)\n      - [String storage](#string-storage)\n      - [Structure](#structure)\n  - [A small performance showcase](#a-small-performance-showcase)\n  - [Useful links](#useful-links)\n  - [License and attribution](#license-and-attribution)\n\n## Examples\n\nSome example code is located in the `examples/` folder of the library's source code. Digging into the `tests/` folder might as well be useful, but you should consider that the testing code does not properly handle memory deallocation and error recovery.\n\nThe most basic example:\n\n```c\n#include \"feed.h\"\n\nvoid some_func(void) {\n    feed_t my_feed;\n    init_feed(\u0026my_feed);\n\n    read_feed(\u0026my_feed, \"/path/to/unpacked/gtfs/feed\");\n\n    if (my_feed.stops_count \u003e -1)\n        printf(\"There are %i stop(s) in the feed!\\n\", my_feed.stops_count);\n    else\n        printf(\"No stops.txt file found in the feed...\\n\");\n\n    free_feed(\u0026my_feed);\n}\n```\n\nDatabase import example:\n\n```c\n#include \u003cstdio\u003e\n#include \"feed.h\"\n#include \"database/database.h\"\n\n#define IS_WRITABLE = 1\n\nint some_db_func(void) {\n    feed_db_t my_db;\n    feed_db_status_t res;\n\n    res = init_feed_db(\u0026my_db, \"/path/to/database/file\", IS_WRITABLE);\n\n    if (res \u003c FEED_DB_SUCCESS) {\n        puts(my_db.error_msg);\n        free_feed_db(\u0026my_db);\n        return 0;\n    }\n\n    res = import_feed_db(\"/path/to/unpacked/gtfs/feed\", \u0026my_db);\n\n    if (res \u003c FEED_DB_SUCCESS) {\n        puts(my_db.error_msg);\n        free_feed_db(\u0026my_db);\n        return 0;\n    }\n\n    // Something done with the data in the database pointed to by my_db connection.\n\n    free_feed_db(\u0026my_db);\n    return 1;\n}\n```\n\n## Build process and dependencies\n\n### Dependencies\n\nOne of the development goals was to keep the dependencies as minimal as possible. So far, CGTFS has following dependencies:\n\n  - [greatest](https://github.com/silentbicycle/greatest) for tests\n  - [sqlite3](https://www.sqlite.org/index.html) for fast \u0026 efficient querying *built through [a special cmake-friendly repo](https://github.com/rakhack/sqlite3-cmake)*\n\n### Build process\n\nThe library should compile on `gcc \u003e= 4.8.4`, `clang \u003e= 5.0.0` and latest Microsoft's C/C++ compiler.\n\nConfiguration: `Release` or `Debug`.\n\n#### Linux\n```\n$ cd /path/to/cgtfs/\n$ git submodule update --init --recursive\n$ mkdir build \u0026\u0026 cd build/\n$ cmake -DCMAKE_BUILD_TYPE=%Configuration% ..\n$ cmake --build .\n\n$ ./tests\n```\n\n#### Windows\n\n```\ncd \\path\\to\\cgtfs\\\ngit submodule update --init --recursive\nmkdir build \u0026\u0026 cd build\ncmake ..\ncmake --build . --config %Configuration%\n\n%Configuration%\\tests.exe\n```\n\n\n## Documentation\n\nThe library is heavily documented via code comments. At releases and important waypoints, the documentation is compiled and commited to the repo. It is [available online](https://rakhack.github.io/cgtfs/doxygen/html/index.html). However, in order to get the most actual documentation, you are encouraged to compile it yourself.\n\n```\n$ doxygen\n```\n\n*Please note: this file is not guaranteed to contain up-to-date information. It is advised that you download the latest release and compile doxygen documentation from its source.*\n\n### Terms\n\nThe terms used throughout the library code and documentation differ from those defined by the [GTFS reference](https://developers.google.com/transit/gtfs/reference/#term-definitions). The following table illustrates the relation between differing CGTFS terms and reference terms, and their definitions, as well as terms used in the library and abscent from the reference.\n\n| CGTFS term | Reference term | Meaning and notes |\n| ---------- | -------------- | ----------------- |\n| Feed (entity/instance/object) | *not defined* | A data structure holding the entirety of a feed's data. *In the code and documentation, may be referred to as a feed entity, feed instance or feed object.* |\n| Directory | Dataset | A set of files which constitutes a GTFS feed. In some contexts, in the code documentation, terms *feed* and *directory* are interchangable. *Note: GTFS datasets are distributed in form of `*.zip` feed archives. This library, however, only works with unpacked feeds.* |\n| Entity (instance) | Record | A complete data structure containing information about a concrete GTFS entity (e.g. information about one route). The library uses the term *entity* to avoid ambiguity with database operations. _Note: however, *entity* is a more abstract term, thus a struct holding one entity's data is, in essense, an entity instance. This documentation may refer to structs simply as **entities** for shortness_. |\n| File | *not defined* | A `*.txt` file, a part of the feed, holding information about all the feed's *entities* of a single type. |\n| Database | n/a | A single *SQLite* database file, created using the supplied SQL schema (preferably, the creation of the database is left to the library, see the database section). |\n\n### API overview\n\nThis library tries to provide a semantic and readable interface. Before release 1.0.0, the library's API is a subject to change without backwards-compatibility concerns.\n\n#### Principles\n\nThere are several core principles which could help in understanding the vast interface of the library. Some of them have been enforced from the beginning, others may be gradually integrated into the API.\n\n  - All structs and enumerations have names ending with `_t`.\n  - All enumerations have members with names reflecting the enumeration's name.\n    - Entity field enumerations start with the first letters of the enumeration's name, e.g. `payment_method_t` has elements `PM_ON_BOARD`, `PM_BEFOREHAND` and `PM_NOT_SET`.\n      - There are exceptions at naming conflicts, e.g. `pathway_mode_t`, which would have to start its members' names with `PM` but uses `PTMD` instead.\n      - Additionally, all entity field enumerations have `..._NOT_SET` members.\n  - All structs have `init_...()` functions for initializing them. These functions MUST be called before the first use of the structure.\n    - Feed entity and record entity structs also have `read_...()` and `equal_...()` functions.\n    - Structs which need deallocation after use have `free_...()` functions.\n  - All functions which have to do something with database operations have `_db` postfix.\n\n#### String storage\n\nString values in CGTFS are stored in memory using statically allocated c-strings. Hence, parsing an unsually long string value may lead to a fatal crash. Default string field lengths are rather sensible but might by too big (bloating the RAM used significantly) or too small (causing crash).\n\nTo mitigate that, all string field length definitions are located in the `xstrlengths.h` header. Actual field length definitions are heavily commented in the lower part of the file. By default, they are using `CGTFS_SL_BASE_` definitions found in the upper part of the file. There are three possible usage cases:\n\n  1. All left as it is in hopes for lucky circumstances.\n  2. Definition `CGTFS_SL_MODE_PREPARATION` is uncommented, reserving an insensible amount of memory for all fields.\n  3. Maximum length of each field type is deduced from the supposed data sources (useful if you're working with the data from a specific agency). This is left to the developer.\n\n#### Structure\n\nThe library's API is divided into two so called layers, additional auxiliary functionality and loosely related helpers:\n\n  - **Core layer** provides basic definitions and functions for handling GTFS feeds and entities, and includes:\n    - **feed object definition** to store data of an entire feed and functions for working with it:\n      - a function to initialize a feed object;\n      - a function to parse a feed object from a given directory path;\n      - a function to determine whether two feed objects are equal;\n    - field enumerations to represent types and values of the fields which can only take values from a limited set defined by the specification, e.g. [`routes.txt/route type`](https://developers.google.com/transit/gtfs/reference/#routestxt);\n      - functions to parse field enumeration values from a char array;\n    - **entity definitions** to represent e.g. an agency, a stop, a shape, etc. and **functions** for handling them;\n      - functions to initialize entity instances;\n      - functions to parse entity instances from a char array of field names and a char array of field values;\n      - functions to determine whether two entity instances are equal;\n    - **batch entity parsing functions** which parse an array of entities from a given `*.txt` file path;\n  - **Database layer** provides definitions and functions for working with entities defined in the *core layer* with/through/in a SQLite database instance, and includes:\n    - **definition of a connection to a sqlite database** and **functions** for working with it:\n      - a function to initialize a database connection;\n      - a function to free/close a database connection;\n      - a function to setup a database at an opened connection for a GTFS feed;\n    - storage transition functions:\n      - a **function** to store the contents of a feed from a specified directory into a specified database connection (non-semantically, storing all values as TEXT) *(see note below)*;\n      - a **function** to store the contents of a feed from a specified directory into a specified database connection (semantically, parsing every record) *(see note below)*;\n      - a **function** to fetch the contents of a feed from a specified database connection into a specicfied feed object;\n    - an **enumeration** of general database operation results (success / failure / so-so);\n    - **functions** to store entities using a specified database connection;\n    - the so-called table operations:\n      - **batch entity storing functions** which parse an array of entities from a given `*.txt` file path into a database table (doing so directly, without keeping an intermediate array in the memory);\n      - **batch entity fetching functions** which retrieve an array of entities of a single type from a specified database connection;\n  - **Utilities** include:\n    - **functions** for reading CSV files;\n    - an assisting **function** for clearing a c-string array;\n    - utilitary **functions** for working a with sqlite database;\n  - **Helpers** include:\n    - several preprocessor definitions used across the library;\n    - a **function** for extracting filename without extension from a given path;\n    - a **function** for making a filepath from a directory and a file in it;\n    - a **function** for converting degrees into radians;\n    - a **geolocation definition** which holds a latitude value and a longitude value;\n      - a **function** for calculating a distance (in meters) between the two geolocation points.\n\n_Note: CGTFS provides two ways of parsing a directory into a database, semantic and non-semantic. Semantic stores all values according to the specification, creating a reference-defined database layout and filling it with data of according types. Non-semantic directly translates GTFS *.txt files as CSVs into the database, creating a layout based on file headers and storing all data as text. See more in the related documentation page._\n\nA more detailed documentation for each layer, definition and function can be found in the module documentation.\n\n## A small performance showcase\n\nThe following listing provides output from the CGTFS ./bench executable, ran on several GTFS feeds on a Win10 i7-8550U NVMe machine through WSL 1.\n\n```\n# Pocono Pony (1.65 mb)\n\nBenchmark results for / feed dir -\u003e memory parsing:\n -\u003e 1        iteration:    235690500 ns. / 1 iter. =\u003e 235.690500 ms.\n -\u003e 10       iterations:   1522158000 ns. / 10 iter. =\u003e 152.215800 ms.\n---------\n\nBenchmark results for / feed dir -\u003e db parsing (semantic):\n -\u003e 1        iteration:    3559970000 ns. / 1 iter. =\u003e 3559.970000 ms.\n -\u003e 10       iterations:   33531279900 ns. / 10 iter. =\u003e 3353.127990 ms.\n---------\n\nBenchmark results for / feed dir -\u003e db parsing (non-semantic):\n -\u003e 1        iteration:    468014300 ns. / 1 iter. =\u003e 468.014300 ms.\n -\u003e 10       iterations:   5315382000 ns. / 10 iter. =\u003e 531.538200 ms.\n---------\n\n\n# LSL (20.6 mb)\n\nBenchmark results for / feed dir -\u003e memory parsing:\n -\u003e 1        iteration:    2053141000 ns. / 1 iter. =\u003e 2053.141000 ms.\n -\u003e 10       iterations:   19250407600 ns. / 10 iter. =\u003e 1925.040760 ms.\n---------\n\nBenchmark results for / feed dir -\u003e db parsing (semantic):\n -\u003e 1        iteration:    14864547700 ns. / 1 iter. =\u003e 14864.547700 ms.\n -\u003e 10       iterations:   151779149800 ns. / 10 iter. =\u003e 15177.914980 ms.\n---------\n\nBenchmark results for / feed dir -\u003e db parsing (non-semantic):\n -\u003e 1        iteration:    3478527500 ns. / 1 iter. =\u003e 3478.527500 ms.\n -\u003e 10       iterations:   34999966400 ns. / 10 iter. =\u003e 3499.996640 ms.\n---------\n\n\n# MTA NYC Transit Subway (63.4 mb)\n\nBenchmark results for / feed dir -\u003e memory parsing:\n -\u003e 1        iteration:    5560275600 ns. / 1 iter. =\u003e 5560.275600 ms.\n -\u003e 10       iterations:   53092411900 ns. / 10 iter. =\u003e 5309.241190 ms.\n---------\n\nBenchmark results for / feed dir -\u003e db parsing (semantic):\n -\u003e 1        iteration:    40642734000 ns. / 1 iter. =\u003e 40642.734000 ms.\n -\u003e 10       iterations:   376395287900 ns. / 10 iter. =\u003e 37639.528790 ms.\n---------\n\nBenchmark results for / feed dir -\u003e db parsing (non-semantic):\n -\u003e 1        iteration:    9276035200 ns. / 1 iter. =\u003e 9276.035200 ms.\n -\u003e 10       iterations:   91401069000 ns. / 10 iter. =\u003e 9140.106900 ms.\n---------\n\n```\n\n## Useful links\n\n  - [Official GTFS static reference](https://developers.google.com/transit/gtfs/reference/)\n  - [Other GTFS handling libraries](https://github.com/CUTR-at-USF/awesome-transit#gtfs-libraries)\n\n## License and attribution\n\nThe library is developed and distributed under the [MIT License](https://choosealicense.com/licenses/mit/), a copy of which can be found in the root of the project. Documentation and materials other than the library's source code are distributed under the [Creative Commons Attribution 4.0 License](https://creativecommons.org/licenses/by/4.0/).\n\nFiles under the `tests/data/google_sample` directory constitute [an example Google Transit Feed](https://developers.google.com/transit/gtfs/examples/gtfs-feed) created and [shared by Google](https://developers.google.com/readme/policies/) and are used according to terms described in the [Creative Commons 3.0 Attribution License](https://creativecommons.org/licenses/by/3.0/). *Additionaly, parts of the in-code documentation may contain pieces of the [GTFS reference](https://developers.google.com/transit/gtfs/reference/).*\n\nFiles under the `tests/data/pocono_pony` contain the open data publicly provided by [Monroe County Transportation Authority](https://www.gomcta.com/index.php) under the [Monroe County Transportation Authority Transit Data Developer Terms of Use](https://www.gomcta.com/developerapi.php).\n\nOther files under the `tests/data` directory may contain elements of the [open data](http://www.lsl.fi/lisatietoa/avoin-data/) publicly provided by [Lahden Seudun Liikenne](http://www.lsl.fi/) under the [Creative Commons Attribution 4.0 License](https://creativecommons.org/licenses/by/4.0/deed.fi).\n\nUsage of the public data provided by Google, Monroe County Transportation Authority and Lahden Seudun Liikenne does not suggest endorsement by any of the aforementioned licensors.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4rterius%2Fcgtfs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F4rterius%2Fcgtfs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4rterius%2Fcgtfs/lists"}