Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jbgruber/zugr

R package to wrap the Deutsche Bahn Fahrplan API :bullettrain_side:
https://github.com/jbgruber/zugr

Last synced: 3 months ago
JSON representation

R package to wrap the Deutsche Bahn Fahrplan API :bullettrain_side:

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
message = FALSE
)
```

# zugr

[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)

Just a small package that wraps the Fahrplan API of the German train company [db](https://www.bahn.de).
The main idea is to be able to cast your net a bit wider in cases when you are flexible to go on a different hour, day, or week.

> [!NOTE]
> I do not have time to actively develop this package. It should be seen as a prototype. If you want to take it over as maintainer to develop it further, [let me know](mailto:[email protected]).

## Installation

You can install the development version of zugr like so (or with `remotes`, but the cool kids use `pak` now :grin:):

``` r
# install.packages("pak)
pak::pak("JBGruber/zugr")
```

## Example: When should I go to Amsterdam

A journey that I take often is from Wiesbaden to Amsterdam.
There are really fast connections between these cities, which is awsome!
What is less great is trying to decide if I go Monday or Tuesday, on the weekend or during the week, or if I should rather go in a different week?
If you've used the interface of [bahn.de](https://www.bahn.de), you know these decisions require a lot of useless click-work.
Well not anymore...

The first step is to get the station ID for the main stations in Wiesbaden and Amsterdam:

```{r example}
library(zugr)
wi <- search_station("Wiesbaden HBF")
wi
```

The first one is what I need!
Now Amsterdam.
Here I don't need as many options:

```{r}
adam <- search_station("Amsterdam", n_res = 2)
adam
```

Now I can make a search for next Tuesday Morning (as of writing this):

```{r}
next_tuesday <- bahn_search(
from = wi$id[1],
to = adam$id[1],
start = "2024-02-06T05:00:00",
end = "2024-02-06T19:00:00"
)
next_tuesday
```

Note that the `end` parameter marks the time at which the train should arrive at the latest,
while `start` marks the earliest departure that should be included in the results.

So what is the cheapest connection?
Easy to see with some `R` commands :smirk::

```{r}
library(tidyverse)
next_tuesday |>
slice_min(price, n = 1) |>
select(-id)
```

With 5 connections and more than 6 hours, this is a rather terrible connections.
It is cheap though.
Let's see s quick visualisation instead:

```{r}
next_tuesday |>
# some connections include trains from different companies, the prices for
# these are not included
filter(!is.na(price),
!duplicated(start),
start < "2024-02-06 11:00:00") |> # exclude some connections for readability
ggplot(aes(x = start, y = price, label = changes, fill = duration)) +
geom_col() +
geom_text(aes(y = price / 2), colour = "white") +
scale_y_continuous(labels = function(x) paste0(x, "€")) +
scale_fill_gradient(labels = function(x) paste0(round(x / 60 / 60, 1), "h")) +
labs(x = "depature", y = NULL,
title = "Prices and number of changes for different connections") +
theme_minimal()
```

With more data, this gets a little more interesting.

```{r echo=FALSE}
if (!file.exists("march.rds")) {
march <- bahn_search(
from = wi$id[1],
to = adam$id[1],
start = "2024-03-01T00:00:00",
end = "2024-04-01T00:00:00"
)
saveRDS(march, "march.rds")
} else {
march <- readRDS("march.rds")
}
```

```{r eval=FALSE}
march <- bahn_search(
from = wi$id[1],
to = adam$id[1],
start = "2024-03-01T00:00:00",
end = "2024-04-01T00:00:00"
)
```

Let's see which days would be especially cheap:

```{r}
march |>
mutate(date = as.Date(start)) |>
group_by(date) |>
slice_min(price, n = 1, with_ties = FALSE) |>
ggplot(aes(x = date, y = price)) +
geom_col() +
labs(x = NULL, y = NULL,
title = "Cheapest price per day") +
scale_y_continuous(labels = function(x) paste0(x, "€")) +
theme_minimal()
```

So I could get a train on any day in March for the price of 32.90€.
This is all nice, but let's only take into account connections I would actually consider:

```{r}
march |>
filter(changes <= 2,
hour(start) > 7) |>
mutate(date = as.Date(start),
weekday = wday(start, label = TRUE)) |>
group_by(date) |>
slice_min(price, n = 1, with_ties = FALSE) |>
ggplot(aes(x = date, y = price, label = weekday)) +
geom_col() +
geom_text(hjust = 0, angle = 90) +
labs(x = NULL, y = NULL,
title = "Cheapest price per day with <= 2 changes and after 7am") +
scale_y_continuous(labels = function(x) paste0(x, "€")) +
theme_minimal()
```

It also looks like historic searches are possible.

```{r}
historic <- bahn_search(
from = wi$id[1],
to = adam$id[1],
start = "2024-01-22T05:00:00",
end = "2024-01-22T19:00:00",
parse = FALSE
)
```

I do not parse the results here to be able to explore the data a bit more (parsing only extracts the information that I found interesting on first glance).
However, the results contain a message that the connection is in the past and information about price and demand seem absent.

```{r}
historic[[1]][["verbindungen"]][[1]][["meldungenAsObject"]][[1]][["nachrichtLang"]]
```