Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gesiscss/tubecleanr
(Mini) R package for preprocessing YouTube comment data collected with tuber or vosonSML
https://github.com/gesiscss/tubecleanr
preprocessing r tuber vosonsml youtube
Last synced: about 1 month ago
JSON representation
(Mini) R package for preprocessing YouTube comment data collected with tuber or vosonSML
- Host: GitHub
- URL: https://github.com/gesiscss/tubecleanr
- Owner: gesiscss
- License: other
- Created: 2024-02-06T20:01:36.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-02-26T09:46:09.000Z (10 months ago)
- Last Synced: 2024-02-26T10:56:35.651Z (10 months ago)
- Topics: preprocessing, r, tuber, vosonsml, youtube
- Language: R
- Homepage: https://gesiscss.github.io/tubecleanR/
- Size: 6.71 MB
- Stars: 1
- Watchers: 6
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# tubecleanR
This is a mini `R` package for cleaning and preprocess *YouTube* comment data collected with the `R` packages [tuber](https://github.com/gojiplus/tuber) or [vosonSML](https://github.com/vosonlab/vosonSML).
The package is a collection of several functions that were developed during several workshops on collecting and analyzing *YouTube* data at [GESIS - Leibniz Institute for the Social Sciences](https://www.gesis.org/home). The main function of the package is `parse_yt_comments()` which takes a dataframe containing *YouTube* comments collected with `tuber` or `vosonSML` as input and outputs a processed dataframe in which URLs/links, video timestamps user mentions, emoticons, and emoji have been extracted from the comments into separate colums. In addition to this, the function creates a columns containing textual descriptions of the emoji, and another one containing a cleaned version of the comment in which the elements listed before as well as numbers and punctuation have been removed.**Please note**: The functions in this package are heavily dependent on the structure of the data exports from `tuber` and `vosonSML`, and, by extension, the structure of the *YouTube* API.
If you are interested in becoming a maintainer of this package, feel free to contact us.
## 1) Installation```R
# GitHub version
library(remotes)
remotes::install_github("gesiscss/tubecleanR")
```
## 2) Demo dataWe have created some simulated *YouTube* comment data in the `tuber` and `vosonSML` formats
```R
# attaching package
library(tubecleanR)# Checking example comments bundled with the package
View(tuberComments)
View(vosonComments)# Parsing comments
tuber_parsed <- parse_yt_comments(tuberComments)
voson_parsed <- parse_yt_comments(vosonComments)# Checking parsed versions of example comments
View(tuber_parsed)
View(Voson_parsed)
```
## 3) Using your own dataThe `parse_yt_comments()` function is meant to be used for *YouTube* comment data collected with the `get_all_comments()` function from `tuber` or the `Collect()` function from `vosonSML`. Both of those require access credentials for the *YouTube API*. Check the documentation of those two packages for further details.
If you want to learn more about getting access to the *YouTube* API, collecting comment (and other) data from the API using `R`, and processing and exploring the resulting data, you can also check out the materials from our [workshop](https://github.com/jobreu/youtube-workshop-gesis-2023).
## 4) Citation
If you are using this package in your research, please cite it as follows:```R
> citation("tubecleanR")
``````R
To cite package ‘tubecleanR’ in publications use:Kohne, J., & Breuer, J. (2024). tubecleanR: Parsing and Preprocessing YouTube Comment
Data. R package version 0.1.0. .A BibTeX entry for LaTeX users is
@Manual{,
title = {tubecleanR: Parsing and Preprocessing YouTube Comment Data},
author = {Julian Kohne and Johannes Breuer},
year = {2024},
note = {R package version 0.1.0},
url = {https://gesiscss.github.io/tubecleanR/},
}
```