Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jbgruber/zugr

R package to wrap the Deutsche Bahn Fahrplan API :bullettrain_side:
https://github.com/jbgruber/zugr

Last synced: 3 months ago
JSON representation

R package to wrap the Deutsche Bahn Fahrplan API :bullettrain_side:

Host: GitHub
URL: https://github.com/jbgruber/zugr
Owner: JBGruber
License: gpl-3.0
Created: 2023-09-30T08:58:50.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-02-04T19:36:35.000Z (11 months ago)
Last Synced: 2024-02-04T21:54:40.777Z (11 months ago)
Language: R
Homepage:
Size: 157 KB
Stars: 15
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%",

  message = FALSE

)

```

# zugr

[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)

Just a small package that wraps the Fahrplan API of the German train company [db](https://www.bahn.de).

The main idea is to be able to cast your net a bit wider in cases when you are flexible to go on a different hour, day, or week.

> [!NOTE]

> I do not have time to actively develop this package. It should be seen as a prototype. If you want to take it over as maintainer to develop it further, [let me know](mailto:[email protected]).

## Installation

You can install the development version of zugr like so (or with `remotes`, but the cool kids use `pak` now :grin:):

``` r

# install.packages("pak)

pak::pak("JBGruber/zugr")

```

## Example: When should I go to Amsterdam

A journey that I take often is from Wiesbaden to Amsterdam.

There are really fast connections between these cities, which is awsome!

What is less great is trying to decide if I go Monday or Tuesday, on the weekend or during the week, or if I should rather go in a different week?

If you've used the interface of [bahn.de](https://www.bahn.de), you know these decisions require a lot of useless click-work.

Well not anymore...

The first step is to get the station ID for the main stations in Wiesbaden and Amsterdam:

```{r example}

library(zugr)

wi <- search_station("Wiesbaden HBF")

wi

```

The first one is what I need!

Now Amsterdam.

Here I don't need as many options:

```{r}

adam <- search_station("Amsterdam", n_res = 2)

adam

```

Now I can make a search for next Tuesday Morning (as of writing this):

```{r}

next_tuesday <- bahn_search(

  from = wi$id[1],

  to = adam$id[1],

  start = "2024-02-06T05:00:00",

  end =  "2024-02-06T19:00:00"

)

next_tuesday

```

Note that the `end` parameter marks the time at which the train should arrive at the latest,

while `start` marks the earliest departure that should be included in the results.

So what is the cheapest connection?

Easy to see with some `R` commands :smirk::

```{r}

library(tidyverse)

next_tuesday |> 

  slice_min(price, n = 1) |> 

  select(-id)

```

With 5 connections and more than 6 hours, this is a rather terrible connections.

It is cheap though.

Let's see s quick visualisation instead:

```{r}

next_tuesday |> 

  # some connections include trains from different companies, the prices for

  # these are not included

  filter(!is.na(price),

         !duplicated(start),

         start < "2024-02-06 11:00:00") |> # exclude some connections for readability

  ggplot(aes(x = start, y = price, label = changes, fill = duration)) +

  geom_col() +

  geom_text(aes(y = price / 2), colour = "white") +

  scale_y_continuous(labels = function(x) paste0(x, "€")) +

  scale_fill_gradient(labels = function(x) paste0(round(x / 60 / 60, 1), "h")) +

  labs(x = "depature", y = NULL, 

       title = "Prices and number of changes for different connections") +

  theme_minimal()

```

With more data, this gets a little more interesting.

```{r echo=FALSE}

if (!file.exists("march.rds")) {

  march <- bahn_search(

    from = wi$id[1],

    to = adam$id[1],

    start = "2024-03-01T00:00:00",

    end =  "2024-04-01T00:00:00"

  )

  saveRDS(march, "march.rds")

} else {

  march <- readRDS("march.rds")

}

```

```{r eval=FALSE}

march <- bahn_search(

  from = wi$id[1],

  to = adam$id[1],

  start = "2024-03-01T00:00:00",

  end =  "2024-04-01T00:00:00"

)

```

Let's see which days would be especially cheap:

```{r}

march |> 

  mutate(date = as.Date(start)) |> 

  group_by(date) |> 

  slice_min(price, n = 1, with_ties = FALSE) |> 

  ggplot(aes(x = date, y = price)) +

  geom_col() +

  labs(x = NULL, y = NULL, 

       title = "Cheapest price per day") +

  scale_y_continuous(labels = function(x) paste0(x, "€")) +

  theme_minimal()

```

So I could get a train on any day in March for the price of 32.90€.

This is all nice, but let's only take into account connections I would actually consider:

```{r}

march |> 

  filter(changes <= 2,

         hour(start) > 7) |> 

  mutate(date = as.Date(start),

         weekday = wday(start, label = TRUE)) |> 

  group_by(date) |> 

  slice_min(price, n = 1, with_ties = FALSE) |> 

  ggplot(aes(x = date, y = price, label = weekday)) +

  geom_col() +

  geom_text(hjust = 0, angle = 90) +

  labs(x = NULL, y = NULL, 

       title = "Cheapest price per day with <= 2 changes and after 7am") +

  scale_y_continuous(labels = function(x) paste0(x, "€")) +

  theme_minimal()

```

It also looks like historic searches are possible.

```{r}

historic <- bahn_search(

  from = wi$id[1],

  to = adam$id[1],

  start = "2024-01-22T05:00:00",

  end =  "2024-01-22T19:00:00",

  parse = FALSE

)

```

I do not parse the results here to be able to explore the data a bit more (parsing only extracts the information that I found interesting on first glance).

However, the results contain a message that the connection is in the past and information about price and demand seem absent.

```{r}

historic[[1]][["verbindungen"]][[1]][["meldungenAsObject"]][[1]][["nachrichtLang"]]

```