Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jbgruber/zugr
R package to wrap the Deutsche Bahn Fahrplan API :bullettrain_side:
https://github.com/jbgruber/zugr
Last synced: 3 months ago
JSON representation
R package to wrap the Deutsche Bahn Fahrplan API :bullettrain_side:
- Host: GitHub
- URL: https://github.com/jbgruber/zugr
- Owner: JBGruber
- License: gpl-3.0
- Created: 2023-09-30T08:58:50.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-04T19:36:35.000Z (11 months ago)
- Last Synced: 2024-02-04T21:54:40.777Z (11 months ago)
- Language: R
- Homepage:
- Size: 157 KB
- Stars: 15
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
message = FALSE
)
```# zugr
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
Just a small package that wraps the Fahrplan API of the German train company [db](https://www.bahn.de).
The main idea is to be able to cast your net a bit wider in cases when you are flexible to go on a different hour, day, or week.> [!NOTE]
> I do not have time to actively develop this package. It should be seen as a prototype. If you want to take it over as maintainer to develop it further, [let me know](mailto:[email protected]).## Installation
You can install the development version of zugr like so (or with `remotes`, but the cool kids use `pak` now :grin:):
``` r
# install.packages("pak)
pak::pak("JBGruber/zugr")
```## Example: When should I go to Amsterdam
A journey that I take often is from Wiesbaden to Amsterdam.
There are really fast connections between these cities, which is awsome!
What is less great is trying to decide if I go Monday or Tuesday, on the weekend or during the week, or if I should rather go in a different week?
If you've used the interface of [bahn.de](https://www.bahn.de), you know these decisions require a lot of useless click-work.
Well not anymore...The first step is to get the station ID for the main stations in Wiesbaden and Amsterdam:
```{r example}
library(zugr)
wi <- search_station("Wiesbaden HBF")
wi
```The first one is what I need!
Now Amsterdam.
Here I don't need as many options:```{r}
adam <- search_station("Amsterdam", n_res = 2)
adam
```Now I can make a search for next Tuesday Morning (as of writing this):
```{r}
next_tuesday <- bahn_search(
from = wi$id[1],
to = adam$id[1],
start = "2024-02-06T05:00:00",
end = "2024-02-06T19:00:00"
)
next_tuesday
```Note that the `end` parameter marks the time at which the train should arrive at the latest,
while `start` marks the earliest departure that should be included in the results.So what is the cheapest connection?
Easy to see with some `R` commands :smirk::```{r}
library(tidyverse)
next_tuesday |>
slice_min(price, n = 1) |>
select(-id)
```With 5 connections and more than 6 hours, this is a rather terrible connections.
It is cheap though.
Let's see s quick visualisation instead:```{r}
next_tuesday |>
# some connections include trains from different companies, the prices for
# these are not included
filter(!is.na(price),
!duplicated(start),
start < "2024-02-06 11:00:00") |> # exclude some connections for readability
ggplot(aes(x = start, y = price, label = changes, fill = duration)) +
geom_col() +
geom_text(aes(y = price / 2), colour = "white") +
scale_y_continuous(labels = function(x) paste0(x, "€")) +
scale_fill_gradient(labels = function(x) paste0(round(x / 60 / 60, 1), "h")) +
labs(x = "depature", y = NULL,
title = "Prices and number of changes for different connections") +
theme_minimal()
```With more data, this gets a little more interesting.
```{r echo=FALSE}
if (!file.exists("march.rds")) {
march <- bahn_search(
from = wi$id[1],
to = adam$id[1],
start = "2024-03-01T00:00:00",
end = "2024-04-01T00:00:00"
)
saveRDS(march, "march.rds")
} else {
march <- readRDS("march.rds")
}
``````{r eval=FALSE}
march <- bahn_search(
from = wi$id[1],
to = adam$id[1],
start = "2024-03-01T00:00:00",
end = "2024-04-01T00:00:00"
)
```Let's see which days would be especially cheap:
```{r}
march |>
mutate(date = as.Date(start)) |>
group_by(date) |>
slice_min(price, n = 1, with_ties = FALSE) |>
ggplot(aes(x = date, y = price)) +
geom_col() +
labs(x = NULL, y = NULL,
title = "Cheapest price per day") +
scale_y_continuous(labels = function(x) paste0(x, "€")) +
theme_minimal()
```So I could get a train on any day in March for the price of 32.90€.
This is all nice, but let's only take into account connections I would actually consider:```{r}
march |>
filter(changes <= 2,
hour(start) > 7) |>
mutate(date = as.Date(start),
weekday = wday(start, label = TRUE)) |>
group_by(date) |>
slice_min(price, n = 1, with_ties = FALSE) |>
ggplot(aes(x = date, y = price, label = weekday)) +
geom_col() +
geom_text(hjust = 0, angle = 90) +
labs(x = NULL, y = NULL,
title = "Cheapest price per day with <= 2 changes and after 7am") +
scale_y_continuous(labels = function(x) paste0(x, "€")) +
theme_minimal()
```It also looks like historic searches are possible.
```{r}
historic <- bahn_search(
from = wi$id[1],
to = adam$id[1],
start = "2024-01-22T05:00:00",
end = "2024-01-22T19:00:00",
parse = FALSE
)
```I do not parse the results here to be able to explore the data a bit more (parsing only extracts the information that I found interesting on first glance).
However, the results contain a message that the connection is in the past and information about price and demand seem absent.```{r}
historic[[1]][["verbindungen"]][[1]][["meldungenAsObject"]][[1]][["nachrichtLang"]]
```