Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jemus42/poddr
Just getting some podcast data.
https://github.com/jemus42/poddr
Last synced: 15 days ago
JSON representation
Just getting some podcast data.
- Host: GitHub
- URL: https://github.com/jemus42/poddr
- Owner: jemus42
- License: other
- Created: 2020-11-30T23:10:57.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-05-19T13:25:44.000Z (6 months ago)
- Last Synced: 2024-05-19T14:33:05.413Z (6 months ago)
- Language: R
- Homepage: https://jemus42.github.io/poddr
- Size: 431 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
editor_options:
chunk_output_type: console
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
cache = TRUE
)
```# poddr
[![R-CMD-check](https://github.com/jemus42/poddr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/jemus42/poddr/actions/workflows/R-CMD-check.yaml)
The goal of poddr is to collect podcast data so I can display it at [podcasts.jemu.name](https://podcasts.jemu.name/). It's not intended to be a real package for real people.
## Installation
You can install the released version of poddr from here with:
``` r
remotes::install_github("jemus42/poddr")
```## Example
Here's the bulk of what's inside the tin:
```{r setup}
library(dplyr, warn.conflicts = FALSE)
library(poddr)
```### The Incomparable
The basic workflow is simple:
1. Get a list of all the shows on the network, including the relevant URLs for further parsing.
2. Get all the episodes of the shows selected. To not bother the webserver too much, I'm limiting the selection to a single show.```{r incomparable}
incomparable_shows <- incomparable_get_shows()
incomparable_showsincomparable_episodes <- incomparable_shows |>
filter(show == "Unjustly Maligned") |>
incomparable_get_episodes()incomparable_episodes
```### Relay.fm
Same procedure as before, also with one show.
```{r relay}
relay_shows <- relay_get_shows()
relay_showsrelay_episodes <- relay_shows |>
filter(show == "Connected") |>
relay_get_episodes()relay_episodes
```### ATP
Since there's only one show, there's no reason to select one specifically, obviously. However, the website doesn't show a list of *all* episodes on one page, so we'll have to either parse all pages (there's currently 10 total as of December 2020), or select a limit, like `1`, to only get episodes from the first page.
The first page shows the 5 most recent episodes, and subsequent pages show 50 episodes each.```{r atp}
atp <- atp_get_episodes(page_limit = 1)
atp# Looking at the links
atp |>
tidyr::unnest(links) |>
select(number, title, link_text, link_url, link_type)
```### For all the nice people
The regular episode data contains one row per episode, with associated people in a single cell with names separated by `;`. In some cases we're interested in per-person data, for example the total number of appearances of a person on The Incomparable mothership, so we'll longify the data with a helper function that performs the `tidyr::pivot_longer()` and `tidyr::separate_rows()` steps consistently.
Note that relay.fm data only includes "hosts", as there's no separate guest information, so the host/guest distinction is redundant in that case.
```{r gathering}
incomparable_episodes |>
gather_people() |>
select(show, number, person, role)relay_episodes |>
gather_people() |>
select(show, number, person, role)
```