Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jooyoungseo/youtubecaption
Downloading YouTube Subtitle Transcription in a Tidy Tibble Data_Frame in R
https://github.com/jooyoungseo/youtubecaption
Last synced: 3 months ago
JSON representation
Downloading YouTube Subtitle Transcription in a Tidy Tibble Data_Frame in R
- Host: GitHub
- URL: https://github.com/jooyoungseo/youtubecaption
- Owner: jooyoungseo
- License: gpl-3.0
- Created: 2019-03-18T03:12:12.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-05-15T14:24:47.000Z (almost 5 years ago)
- Last Synced: 2024-08-03T06:03:06.577Z (7 months ago)
- Language: R
- Homepage: https://jooyoungseo.github.io/youtubecaption
- Size: 167 KB
- Stars: 36
- Watchers: 2
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
README
---
output:
bookdown::github_document2:
html_preview: false
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# youtubecaption
[data:image/s3,"s3://crabby-images/e00db/e00dbc9bda69b01aa1a2c957b71a263652229d4d" alt="License: GPL v3"](http://www.gnu.org/licenses/gpl-3.0)
[data:image/s3,"s3://crabby-images/fcd7f/fcd7f40976b1da587779cfe88e3765c0be40d626" alt="CRAN status"](https://cran.r-project.org/package=youtubecaption)
[data:image/s3,"s3://crabby-images/47f2c/47f2ca58c11ec034267265ffa5cc71a7bd3dac58" alt="Total Downloads"](https://cranlogs.r-pkg.org/badges/grand-total/youtubecaption)
[data:image/s3,"s3://crabby-images/fdf06/fdf06db85c70e6aacc7600d1329ba08889d61b96" alt="Travis build status"](https://travis-ci.org/jooyoungseo/youtubecaption)
[data:image/s3,"s3://crabby-images/35f26/35f26e9faf4b05459f7ba15b51b744888705f1e0" alt="AppVeyor build status"](https://ci.appveyor.com/project/jooyoungseo/youtubecaption)
[data:image/s3,"s3://crabby-images/2f48d/2f48d63d14a09f449cfea75109468f9956e2a7b6" alt="Codecov test coverage"](https://codecov.io/gh/jooyoungseo/youtubecaption?branch=master)## Motivation
Although there exist some R packages tailored for YouTube API (e.g., 'tuber'), downloading YouTube video subtitle (i.e., caption) in a tidy form has never been a low-hanging fruit. Using 'youtube-transcript-api' Python package under the hood, this R package provides users with a convenient way of parsing and converting a desired YouTube caption into a handy tibble data_frame object. Furthermore, users can easily save a desired YouTube caption data as a tidy Excel file without advanced programming background knowledge.
## Installation
### Python Dependencies
`youtubecaption` requires Anaconda Python environment on your system Path.
If you have not installed Conda environment on your system, please [download and install Anaconda](https://www.anaconda.com/download/) (Python 3.6 or later is recommended).
For this package, I have employed [**youtube-transcript-api**](https://pypi.org/project/youtube-transcript-api/) Python module into R using [**reticulate**](https://rstudio.github.io/reticulate/).
### R Package Installation
### Development Version
You can install the latest development version as follows:
```{r, eval=FALSE}
if(!require(remotes)) {
install.packages("remotes")
}remotes::install_github("jooyoungseo/youtubecaption")
```### Stable Version
You can install the released version of youtubecaption from [CRAN](https://CRAN.R-project.org) with:
```{r, eval=FALSE}
install.packages('youtubecaption')
```## Usage
Please use `get_caption()` function after loading `youtubecaption` package like below:
```{r test, eval=FALSE}
library(youtubecaption)# Let's get the video caption out of Hadley Wickham's "You can't do data science in a GUI":
url <- "https://www.youtube.com/watch?v=cpbtcsGE0OA"
caption <- get_caption(url)
caption#> # A tibble: 1,420 x 5
#> segment_id text start duration vid
#>
#> 1 1 thank you for coming to a meeting ~ 7.13 8.32 cpbtcsGE0~
#> 2 2 in regards to data science GUI with 10.7 8.44 cpbtcsGE0~
#> 3 3 happy with chief data scientist in~ 15.4 7.11 cpbtcsGE0~
#> 4 4 studio as well as the member of th~ 19.1 7.23 cpbtcsGE0~
#> 5 5 Foundation and an attempt professo~ 22.6 6 cpbtcsGE0~
#> 6 6 Stanford and at the University of 26.4 6.48 cpbtcsGE0~
#> 7 7 Auckland he builds both computatio~ 28.6 7.17 cpbtcsGE0~
#> 8 8 and cognitive tools to make data s~ 32.8 7.5 cpbtcsGE0~
#> 9 9 easier faster and more times his w~ 35.7 7.01 cpbtcsGE0~
#> 10 10 includes various packages as well ~ 40.4 6.21 cpbtcsGE0~
#> # ... with 1,410 more rows# Save the caption as an Excel file and open it right away:
get_caption(url = url, savexl = TRUE, openxl = TRUE)
```