Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jemus42/poddr

Just getting some podcast data.
https://github.com/jemus42/poddr

Last synced: 15 days ago
JSON representation

Just getting some podcast data.

Host: GitHub
URL: https://github.com/jemus42/poddr
Owner: jemus42
License: other
Created: 2020-11-30T23:10:57.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2024-05-19T13:25:44.000Z (6 months ago)
Last Synced: 2024-05-19T14:33:05.413Z (6 months ago)
Language: R
Homepage: https://jemus42.github.io/poddr
Size: 431 KB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE

Awesome Lists containing this project

README

        ---

output: github_document

editor_options: 

  chunk_output_type: console

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%",

  cache = TRUE

)

```

# poddr

[![R-CMD-check](https://github.com/jemus42/poddr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/jemus42/poddr/actions/workflows/R-CMD-check.yaml)

The goal of poddr is to collect podcast data so I can display it at [podcasts.jemu.name](https://podcasts.jemu.name/). It's not intended to be a real package for real people.

## Installation

You can install the released version of poddr from here with:

``` r

remotes::install_github("jemus42/poddr")

```

## Example

Here's the bulk of what's inside the tin:

```{r setup}

library(dplyr, warn.conflicts = FALSE)

library(poddr)

```

### The Incomparable

The basic workflow is simple: 

1. Get a list of all the shows on the network, including the relevant URLs for further parsing.

2. Get all the episodes of the shows selected. To not bother the webserver too much, I'm limiting the selection to a single show.

```{r incomparable}

incomparable_shows <- incomparable_get_shows()

incomparable_shows

incomparable_episodes <- incomparable_shows |>

  filter(show == "Unjustly Maligned") |>

  incomparable_get_episodes()

incomparable_episodes

```

### Relay.fm

Same procedure as before, also with one show.

```{r relay}

relay_shows <- relay_get_shows()

relay_shows

relay_episodes <- relay_shows |>

  filter(show == "Connected") |>

  relay_get_episodes()

relay_episodes

```

### ATP

Since there's only one show, there's no reason to select one specifically, obviously. However, the website doesn't show a list of *all* episodes on one page, so we'll have to either parse all pages (there's currently 10 total as of December 2020), or select a limit, like `1`, to only get episodes from the first page.

The first page shows the 5 most recent episodes, and subsequent pages show 50 episodes each.

```{r atp}

atp <- atp_get_episodes(page_limit = 1)

atp

# Looking at the links

atp |>

  tidyr::unnest(links) |>

  select(number, title, link_text, link_url, link_type)

```

### For all the nice people

The regular episode data contains one row per episode, with associated people in a single cell with names separated by `;`. In some cases we're interested in per-person data, for example the total number of appearances of a person on The Incomparable mothership, so we'll longify the data with a helper function that performs the `tidyr::pivot_longer()` and `tidyr::separate_rows()` steps consistently.

Note that relay.fm data only includes "hosts", as there's no separate guest information, so the host/guest distinction is redundant in that case.

```{r gathering}

incomparable_episodes |>

  gather_people() |>

  select(show, number, person, role)

relay_episodes |>

  gather_people() |>

  select(show, number, person, role)

```