{"id":15671758,"url":"https://github.com/defgsus/bahn-api-history","last_synced_at":"2025-03-30T05:24:45.893Z","repository":{"id":76001526,"uuid":"424030413","full_name":"defgsus/bahn-api-history","owner":"defgsus","description":"Historic changelog of Deutsche Bahn Open API data (stations, free parking lots and elevator status)","archived":false,"fork":false,"pushed_at":"2022-12-30T15:58:03.000Z","size":23139,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-05T07:31:58.070Z","etag":null,"topics":["archive","changelog","deutsche-bahn","elevators","escalators","open-data","parking-space","recordings","timeseries"],"latest_commit_sha":null,"homepage":"https://defgsus.github.io/bahn-api-history/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/defgsus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-02T23:31:49.000Z","updated_at":"2022-01-11T01:03:40.000Z","dependencies_parsed_at":"2023-07-03T21:26:43.028Z","dependency_job_id":null,"html_url":"https://github.com/defgsus/bahn-api-history","commit_stats":{"total_commits":334,"total_committers":1,"mean_commits":334.0,"dds":0.0,"last_synced_commit":"165b598f9325b86494b013c84b37891d02385dd2"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defgsus%2Fbahn-api-history","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defgsus%2Fbahn-api-history/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defgsus%2Fbahn-api-history/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defgsus%2Fbahn-api-history/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/defgsus","download_url":"https://codeload.github.com/defgsus/bahn-api-history/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246280251,"owners_count":20752098,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archive","changelog","deutsche-bahn","elevators","escalators","open-data","parking-space","recordings","timeseries"],"created_at":"2024-10-03T15:04:52.306Z","updated_at":"2025-03-30T05:24:45.871Z","avatar_url":"https://github.com/defgsus.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deutsche Bahn API History \n\nThere was this [monumental talk](https://media.ccc.de/v/36c3-10652-bahnmining_-_punktlichkeit_ist_eine_zier)\nin late 2019 about the *correctness* of the punctuality statistics published by\nDeutsche Bahn, which got me interested in [api.deutschebahn.com](https://api.deutschebahn.com).\n\nThis repo contains non of the train schedule data. Instead it has change-logs of the\n[parking api](https://developer.deutschebahn.com/store/apis/info?name=BahnPark\u0026version=v1\u0026provider=DBOpenData),\n[station data api](https://developer.deutschebahn.com/store/apis/info?name=StaDa-Station_Data\u0026version=v2\u0026provider=DBOpenData)\nand the [station facilities status api](https://developer.deutschebahn.com/store/apis/info?name=FaSta-Station_Facilities_Status\u0026version=v2\u0026provider=DBOpenData)\n(status of elevators and escalators), **collected since late January 2020**.\n\nEverything is browsable in the [static data page](https://defgsus.github.io/bahn-api-history/).\n\n\n## Summary\n\nEach table shows the top-ten most-changed objects.\n\n### free parking lots\n \n**288** objects, **73,942** snapshots, **116,433** changes (2020-01-25 23:27:15 - 2022-09-01 09:00:01)\n\n|     id | name                                                     |   num changes |\n|-------:|:---------------------------------------------------------|--------------:|\n| 100054 | Düren P1 Parkplatz Ludwig-Erhardt-Platz                  |          7820 |\n| 100083 | Frankfurt (Main) Hbf P3 Vorfahrt II                      |          4621 |\n| 100201 | Mainz Hbf P3 Tiefgarage Bonifazius-Türme UG -1           |          4314 |\n| 100084 | Frankfurt (Main) Hbf Bustasche                           |          4311 |\n| 100280 | Bad Cannstatt P3 Parkhaus Wilhelmsplatz Ebenen -3 und -2 |          3399 |\n| 100279 | Bad Cannstatt P2 Parkhaus Wilhelmsplatz Ebenen -1 bis 6  |          2801 |\n| 100023 | Berlin Ostbahnhof P1 Parkplatz                           |          2366 |\n| 100291 | Ulm Hbf P2 Parkplatz                                     |          2131 |\n| 100090 | Freiburg (Breisgau) Hbf P1 Tiefgarage am Bahnhof         |          1776 |\n| 100066 | Duisburg Hbf P2 Parkhaus UCI                             |          1759 |\n\n### elevator status\n \n**3,894** objects, **17,278** snapshots, **500,332** changes (2020-01-25 23:16:01 - 2022-09-01 09:01:01)\n\n|       id | name                                                 |   num changes |\n|---------:|:-----------------------------------------------------|--------------:|\n| 10556568 | Tuttlingen ELEVATOR zum Gleis 4/5                    |          1755 |\n| 10556567 | Tuttlingen ELEVATOR zum Gleis 2/3                    |          1727 |\n| 10556569 | Tuttlingen ELEVATOR zu Gleis 1                       |          1727 |\n| 10248843 | Regensburg Hbf ESCALATOR von Empfangshalle zu Brücke |          1492 |\n| 10248859 | Regensburg Hbf ESCALATOR von Empfangshalle zu Brücke |          1430 |\n| 10460422 | Diepholz ELEVATOR zu Gleis 2/3                       |          1419 |\n| 10354470 | Osnabrück Hbf ELEVATOR zu Gleis 1                    |          1414 |\n| 10417241 | Osnabrück Hbf ELEVATOR zu Gleis 4/5                  |          1408 |\n| 10417240 | Osnabrück Hbf ELEVATOR zu Gleis 2/3                  |          1401 |\n| 10466017 | Laupheim West ELEVATOR zu Gleis 2/3                  |          1401 |\n\n### stations\n \n**5,406** objects, **910** snapshots, **67,255** changes (2020-01-27 12:43:06 - 2022-09-01 06:05:01)\n\n|   id | name                         |   num changes |\n|-----:|:-----------------------------|--------------:|\n| 1947 | Friedrichshafen Stadt        |            24 |\n| 6714 | Westerland (Sylt)            |            23 |\n| 2514 | Hamburg Hbf                  |            22 |\n| 3631 | Leipzig Hbf                  |            22 |\n| 1821 | Berlin-Schönefeld Flughafen  |            21 |\n| 1859 | Frankfurt (Oder)             |            21 |\n| 1906 | Freilassing                  |            21 |\n| 4234 | München Hbf                  |            21 |\n| 6418 | Villingen (Schwarzw)         |            21 |\n| 8192 | Flughafen BER - Terminal 1-2 |            21 |\n\n\n## Data\n\nThe APIs are sampled with separate cronjobs running these shell commands:\n\n```shell script\n# parking each 15 minutes\ncurl -X GET --header \"Accept: application/json\" \\\n    --header \"Authorization: Bearer \u003cYOUR_API_TOKEN\u003e\" \\\n    \"https://api.deutschebahn.com/bahnpark/v1/spaces/occupancies\" \\\n    \u003e `date -Is -u`.json\n\n# stations once a day\ncurl -X GET --header \"Accept: application/json\" \\\n    --header \"Authorization: Bearer \u003cYOUR_API_TOKEN\u003e\" \\\n    \"https://api.deutschebahn.com/stada/v2/stations?searchstring=*\" \\\n    \u003e `date -Is -u`.json\n\n# elevators each hour\ncurl -X GET --header \"Accept: application/json\" \\\n    --header \"Authorization: Bearer \u003cYOUR_API_TOKEN\u003e\" \\\n    \"https://api.deutschebahn.com/fasta/v2/facilities?type=ESCALATOR,ELEVATOR\"\n    \u003e `date -Is -u`.json\n```\nThis simple setup does no error handling. If the endpoint is temporarily busy\nthe snapshot is lost.\n\nEach API response is a list of objects which look like:\n\n### parking\n\n```json\n{\n  \"allocation\": {\n    \"validData\": true,\n    \"capacity\": 133,\n    \"category\": 4,\n    \"text\": \"\u003e 50\"\n  },\n  \"space\": {\n    \"id\": 100291,\n    \"label\": \"P2\",\n    \"name\": \"Parkplatz Ulm Hauptbahnhof\",\n    \"nameDisplay\": \"Ulm Hbf P2 Parkplatz\",\n    \"station\": {\n      \"id\": 6323,\n      \"name\": \"Ulm Hbf\"\n    },\n    \"title\": \"Ulm Hbf P2 Ulm Hbf P2 Parkplatz\"\n  }\n}\n``` \n\n\u003e Note that the original objects did contain a `timestamp` and `timeSegment` field.\n\u003e There are discarded in the changelogs to minimize the amount of data.\n\n\n### stations\n\n```json\n{\n  \"aufgabentraeger\": {\n    \"name\": \"Nahverkehrsservicegesellschaft Thüringen mbH\",\n    \"shortName\": \"NVS\"\n  },\n  \"category\": 6,\n  \"evaNumbers\": [\n    {\n      \"geographicCoordinates\": {\n        \"coordinates\": [11.593783, 50.93692],\n        \"type\": \"Point\"\n      },\n      \"isMain\": true,\n      \"number\": 8011058\n    }\n  ],\n  \"federalState\": \"Thüringen\",\n  \"hasBicycleParking\": true,\n  \"hasCarRental\": false,\n  \"hasDBLounge\": false,\n  \"hasLocalPublicTransport\": true,\n  \"hasLockerSystem\": false,\n  \"hasLostAndFound\": false,\n  \"hasMobilityService\": \"no\",\n  \"hasParking\": false,\n  \"hasPublicFacilities\": false,\n  \"hasRailwayMission\": false,\n  \"hasSteplessAccess\": \"partial\",\n  \"hasTaxiRank\": false,\n  \"hasTravelCenter\": false,\n  \"hasTravelNecessities\": false,\n  \"hasWiFi\": false,\n  \"mailingAddress\": {\n    \"city\": \"Jena\",\n    \"street\": \"Spitzweidenweg 28\",\n    \"zipcode\": \"07743\"\n  },\n  \"name\": \"Jena Saalbf\",\n  \"number\": 3044,\n  \"priceCategory\": 6,\n  \"regionalbereich\": {\n    \"name\": \"RB Südost\",\n    \"number\": 2,\n    \"shortName\": \"RB SO\"\n  },\n  \"ril100Identifiers\": [\n    {\n      \"geographicCoordinates\": {\n        \"coordinates\": [11.593348001, 50.936519303],\n        \"type\": \"Point\"\n      },\n      \"hasSteamPermission\": true,\n      \"isMain\": true,\n      \"rilIdentifier\": \"UJS\"\n    }\n  ],\n  \"stationManagement\": {\n    \"name\": \"Chemnitz\",\n    \"number\": 115\n  },\n  \"szentrale\": {\n    \"name\": \"Erfurt Hbf\",\n    \"number\": 50,\n    \"publicPhoneNumber\": \"0361/3001055\"\n  },\n  \"timeTableOffice\": {\n    \"email\": \"DBS.Fahrplan.Thueringen@deutschebahn.com\",\n    \"name\": \"Bahnhofsmanagement Chemnitz\"\n  }\n}\n```\n\n### elevators\n\n```json\n{\n  \"description\": \"zu Gleis 1\",\n  \"equipmentnumber\": 10354738,\n  \"geocoordX\": 11.5873405,\n  \"geocoordY\": 50.924981,\n  \"state\": \"ACTIVE\",\n  \"stateExplanation\": \"available\",\n  \"stationnumber\": 3043,\n  \"type\": \"ELEVATOR\"\n}\n```\n\n## Change logs\n\nThe change-logs are collected in json files per year in [docs/data/](docs/data) \nusing a self-baked format which does not contain too much space and allows committing \nnew json lines with minimal diffs. \n\nAll object keys are sorted alphabetically to avoid needless commit diffs.\n\nTo get access to all objects via python:\n```python\nfrom src.changelog_reader import ChangelogReader\n\nfor changelog_file, dates_file in ChangelogReader.get_changelog_files(\"stations\"):\n    reader = ChangelogReader(changelog_file, dates_file)\n    for object_id in reader.object_ids():\n        for timestamp, data in reader.iter_object(object_id):\n            print(f\"object {object_id} at time {timestamp} is {data}\")\n```\n\nIf an object was not listed during a snapshot, `data` will be `None`. \n\nThe `reader.iter_object(object_id)` method iterates through all changes of the \nobject. The `reader.iter_object_snapshots(object_id)` method iterates through \neach snapshot regardless if the object is changed or does not yet exist.\n\n\n## Some graphics\n\nBelow are some plots and crude analysis of the data. The jupyter notebooks \nused for it are in the [notebooks/](notebooks/) directory.  \n\n### elevators \n\nCounting the number of elevators and escalators that do not have state\n`ACTIVE` produces this interesting curve:\n\n![plot of defect elevators per day](docs/img/defect-elevators-per-day.png)\n\nThe different colors represent the amount of time that these machines where\ninactive, 100% meaning it was inactive the whole day.\n\nThe small repeating pikes align with the working days each week. This is\nprobably caused by a mixture of two things: Elevators might tend to break more often \nwhen used, and there are certainly more reports/complaints about defect machines\non workdays, compared to the weekends.\n\nThere seems to be a *bad* trend visible. The number of defect machines is growing.\nHow many machines are there anyways? Plotting the number of listed IDs per day..\n\n![plot of listed elevators per day](docs/img/listed-elevators-per-day.png)\n\n..reveals that there are 200 new devices since beginning of 2020. That is a bigger\nincrease than the increase of the number of defect devices over the same period. \nSomething else is going on...\n\nEach elevator/escalator device has a `stationnumber` attached. From the station data\nwe can get a couple of meta information. After trying a few of them, the \n`aufgabentraeger` entry seems to relate somewhat with the inactivity during \nthe second half of 2021:\n\n![plot of elevator activity per Aufgabenträger](docs/img/elevators-heatmap-bearer.png)\n\nIn the above plot, the y axis has been sorted by mean activity during late 2021. \n*Verband Region Stuttgart* is the main cause of trouble, followed by a couple of\nRhineland-ian associations. The number behind the labels shows the overall number \nof devices of each *Aufgabenträger*. If *Verband Region Stuttgart* drops from \nabout 90% to 64% mean activity per day through the period of Aug. 2021 to mid September\nthat's quite something. \n\nI completely don't know Stuttgart by detail so can only guess about. There's this \n[construction site](https://www.bahnprojekt-stuttgart-ulm.de/presse/pressemitteilungen/newsdetail/news/1489-veraenderte-wegefuehrung-am-stuttgarter-hbf/newsParameter/detail/News/datum/20190704/).\nat the main station which perfectly matches the date. Only that *Stuttgart Hauptbahnhof*\nbelongs to *Nahverkehrsgesellschaft Baden-Württemberg mbH* and they don't show that\ndropout of activity.`\n \nPlotting the change of device activity between early and late 2021 per geo-position\nmakes the finger-pointing even easier:\n \n![plot of change of activity between first and second half of 2021](docs/img/elevators-compare-activity.png)\n\nI admit, there are a lot of elevators in the Rhineland (west) and i wouldn't want\nto manage them all. Stuttgart is the big spot in the south-west, \nBerlin (east) and Hamburg (north) also seem to have evolved ongoing problems.\n\n\n### parking\n\nThe parking data is a little bit lame. Instead of actual numbers of free spots there\nis only a `category` that says:\n\n1. 0 to 10\n2. 11 to 30\n3. 31 to 50\n4. 51 to maximum capacity\n\nFirst of all, here's the number of places for each day that are \n - **listed**: included in the API response list \n - **valid**: have the `validData` flag and contain a value for `category`\n - **active**: a change of `category` was recorded during that day\n\n![plot of listed/valid/active parking spaces per day](docs/img/parking-listed-per-day.png)\n\nThe idea of approximating the *percentage of occupation* using the category\nand the capacity becomes less attractive when looking at the capacity changes\nover time:\n\n![plot of parking capacity per day and space](docs/img/parking-capacity-per-day.png)\n\nIt's quite hard to explain what's going on there. Some parking lots seem to change\ntheir maximum capacity regularly every other weekday. Some of them \ntemporarily loose capacity, maybe because of construction sites and some seem\nto mix up their occupation data with the capacity data. Other parking lots \nseem to grow immensely during a couple of days, or people just type in wrong\nnumbers and some else corrects them? \n\nIn face of this totally erratic data, let's just look at pure `category` numbers:\n \n![plot of parking \"category\" per month and station](docs/img/parking-category-per-month.png)\n\nThe plot shows only stations with a certain amount of activity and \nthe black line shows the average of these stations. \nExcept for late summer (Aug. to Oct.) there does not seem to be happening much.\nOr in other words, the parking lots do not change \ntheir average `category` per month a lot. Also the plot is pretty much unreadable. \n\nWe can also look at the percentage of how much each category is listed. This time\nper day and for all stations:\n \n![plot of parking \"category\" percentage per_day](docs/img/parking-category-percent-per-day.png)\n\nOne very significant impact which is visible here is the corona lock-down which\nhappened in Germany at about 16th of March 2020, which is exactly the beginning\nof the flat area in the upper green line representing the `\u003e 50` category.\n\nApart from the category which somehow represents the **number** of free spaces \nwe can simply plot the amount of change. This might go as a measure of\ngeneral activity. Below is plotted the mean absolute difference of\nthe category value between two hours, shown as average per week and space:\n\n![plot of parking \"category\" change per week and space](docs/img/parking-category-changes-per-week.png)\n\nYou know, just by looking at that one must judge that *the pandemic* is still going on. \n\n\n### stations\n\nThe number of changes to station data per day tells us that the data monkeys \nare somewhat busy:\n\n![plot of number of edited stations per day](docs/img/edited-stations-per-day.png)\n\nThere is only one snapshot stored each day, so the \nnumber of stations edited per day is equal to the number of all edits per day.\nAlso note, that for some stupid reason i setup the cronjob to 7 AM. Unless\nthe data monkeys where up early or working through the night, the changes \nhave probably occurred the day before the snapshot! However, i won't change \nthe snapshot time for consistency.  \n\nSome particular dates jump out of the above graph where more than 5000 \nstations are edited during the same day. Here's a list of the top-five\nchanges for each of these dates. \n\n- **`2020-06-03`**\n  - 5455 x replace `ril100Identifiers.geographicCoordinates.coordinates.0`\n  - 5454 x replace `ril100Identifiers.geographicCoordinates.coordinates.1`\n  - 9 x add `ril100Identifiers.geographicCoordinates`\n  - 1 x replace `localServiceStaff.availability.friday.fromTime`\n  - 1 x replace `localServiceStaff.availability.friday.toTime`\n- **`2021-06-03`**\n  - 5399 x remove `hasSteplessAccess`\n  - 5399 x replace `federalState`\n  - 5399 x replace `regionalbereich.shortName`\n  - 5371 x remove `timeTableOffice`\n  - 267 x replace `ril100Identifiers.isMain`\n- **`2021-06-04`**\n  - 5399 x add `hasSteplessAccess`\n  - 5399 x add `timeTableOffice`\n  - 5399 x replace `federalState`\n- **`2021-06-08`**\n  - 5664 x replace `ril100Identifiers.isMain`\n  - 5458 x replace `evaNumbers.isMain`\n  - 1 x replace `mailingAddress.street`\n  - 1 x replace `evaNumbers.4.isMain`\n  - 1 x replace `ril100Identifiers.4.isMain`\n- **`2021-06-17`**\n  - 5464 x replace `ril100Identifiers.geographicCoordinates.coordinates.0`\n  - 5463 x replace `ril100Identifiers.geographicCoordinates.coordinates.1`\n  - 61 x add `ril100Identifiers.geographicCoordinates`\n  - 3 x replace `mailingAddress.street`\n  - 1 x replace `ril100Identifiers.4.geographicCoordinates.coordinates.0`\n- **`2021-06-26`**\n  - 5399 x replace `ril100Identifiers`\n- **`2021-07-02`**\n  - 5399 x replace `ril100Identifiers`\n  - 5397 x replace `evaNumbers.isMain`\n\nFirst of all, June 3rd (or probably June 2nd) seems to be the traditional day\nto publish updated geo-coords for all stations. In 2021 a couple of major update \nsessions followed after June 3rd, e.g. the `federalState` was replaced\nwith abbreviations, which got reverted again, and things got \nremoved and reappeared later. ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdefgsus%2Fbahn-api-history","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdefgsus%2Fbahn-api-history","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdefgsus%2Fbahn-api-history/lists"}