Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jooyoungseo/youtubecaption

Downloading YouTube Subtitle Transcription in a Tidy Tibble Data_Frame in R
https://github.com/jooyoungseo/youtubecaption

Last synced: 5 days ago
JSON representation

Downloading YouTube Subtitle Transcription in a Tidy Tibble Data_Frame in R

Awesome Lists containing this project

README

        

---
output:
bookdown::github_document2:
html_preview: false
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# youtubecaption

[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](http://www.gnu.org/licenses/gpl-3.0)
[![CRAN status](https://www.r-pkg.org/badges/version/youtubecaption)](https://cran.r-project.org/package=youtubecaption)
[![Total Downloads](https://cranlogs.r-pkg.org/badges/grand-total/youtubecaption?color=orange)](https://cranlogs.r-pkg.org/badges/grand-total/youtubecaption)
[![Travis build status](https://travis-ci.org/jooyoungseo/youtubecaption.svg?branch=master)](https://travis-ci.org/jooyoungseo/youtubecaption)
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/jooyoungseo/youtubecaption?branch=master&svg=true)](https://ci.appveyor.com/project/jooyoungseo/youtubecaption)
[![Codecov test coverage](https://codecov.io/gh/jooyoungseo/youtubecaption/branch/master/graph/badge.svg)](https://codecov.io/gh/jooyoungseo/youtubecaption?branch=master)

## Motivation

Although there exist some R packages tailored for YouTube API (e.g., 'tuber'), downloading YouTube video subtitle (i.e., caption) in a tidy form has never been a low-hanging fruit. Using 'youtube-transcript-api' Python package under the hood, this R package provides users with a convenient way of parsing and converting a desired YouTube caption into a handy tibble data_frame object. Furthermore, users can easily save a desired YouTube caption data as a tidy Excel file without advanced programming background knowledge.

## Installation

### Python Dependencies

`youtubecaption` requires Anaconda Python environment on your system Path.

If you have not installed Conda environment on your system, please [download and install Anaconda](https://www.anaconda.com/download/) (Python 3.6 or later is recommended).

For this package, I have employed [**youtube-transcript-api**](https://pypi.org/project/youtube-transcript-api/) Python module into R using [**reticulate**](https://rstudio.github.io/reticulate/).

### R Package Installation

### Development Version

You can install the latest development version as follows:

```{r, eval=FALSE}
if(!require(remotes)) {
install.packages("remotes")
}

remotes::install_github("jooyoungseo/youtubecaption")
```

### Stable Version

You can install the released version of youtubecaption from [CRAN](https://CRAN.R-project.org) with:

```{r, eval=FALSE}
install.packages('youtubecaption')
```

## Usage

Please use `get_caption()` function after loading `youtubecaption` package like below:

```{r test, eval=FALSE}
library(youtubecaption)

# Let's get the video caption out of Hadley Wickham's "You can't do data science in a GUI":
url <- "https://www.youtube.com/watch?v=cpbtcsGE0OA"
caption <- get_caption(url)
caption

#> # A tibble: 1,420 x 5
#> segment_id text start duration vid
#>
#> 1 1 thank you for coming to a meeting ~ 7.13 8.32 cpbtcsGE0~
#> 2 2 in regards to data science GUI with 10.7 8.44 cpbtcsGE0~
#> 3 3 happy with chief data scientist in~ 15.4 7.11 cpbtcsGE0~
#> 4 4 studio as well as the member of th~ 19.1 7.23 cpbtcsGE0~
#> 5 5 Foundation and an attempt professo~ 22.6 6 cpbtcsGE0~
#> 6 6 Stanford and at the University of 26.4 6.48 cpbtcsGE0~
#> 7 7 Auckland he builds both computatio~ 28.6 7.17 cpbtcsGE0~
#> 8 8 and cognitive tools to make data s~ 32.8 7.5 cpbtcsGE0~
#> 9 9 easier faster and more times his w~ 35.7 7.01 cpbtcsGE0~
#> 10 10 includes various packages as well ~ 40.4 6.21 cpbtcsGE0~
#> # ... with 1,410 more rows

# Save the caption as an Excel file and open it right away:
get_caption(url = url, savexl = TRUE, openxl = TRUE)
```