Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/chainsawriot/academicquacker
🦆 Convert raw data collected by academictwitteR into DuckDB
https://github.com/chainsawriot/academicquacker
Last synced: 27 days ago
JSON representation
🦆 Convert raw data collected by academictwitteR into DuckDB
- Host: GitHub
- URL: https://github.com/chainsawriot/academicquacker
- Owner: chainsawriot
- License: gpl-3.0
- Created: 2021-07-30T16:50:56.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2022-10-04T15:16:59.000Z (about 2 years ago)
- Last Synced: 2023-03-23T17:28:45.404Z (over 1 year ago)
- Language: R
- Homepage:
- Size: 3.76 MB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# academicquacker
[![R-CMD-check](https://github.com/chainsawriot/academicquacker/workflows/R-CMD-check/badge.svg)](https://github.com/chainsawriot/academicquacker/actions)
If it collects tweets like a duck, binds tweets like a duck, and quacks like a duck, then it probably _is_ an academic.
The goal of this (experimental) package is to convert raw data collected by [academictwitteR](https://github.com/cjbarrie/academictwitteR) into [DuckDB](https://github.com/duckdb/duckdb) in a memory efficient manner. This package also serves as a test bed for rolling out experimental features of `bind_tweets` in `academictwitteR`.
Why DuckDB? Because it quacks... I mean, [it rocks](https://duckdb.org/docs/why_duckdb)!
Why isn't the last R capitalized? Because the developer always forgets which "r" in the word "academicquacker" to capitalize.
## Installation
You can install the development version of academicquacker with:
``` r
remotes::install_github("chainsawriot/academicquacker")
```## Example
Suppose `dir` is a directory hosting json files collected with academictwitteR.
```{r, include = FALSE}
dir <- "tests/testdata/ica21"
``````{r example1}
library(academicquacker)
con <- quack(dir, db = "mydata.duckdb", db_close = TRUE)
```It won't fill up all main memory in your computer.
## Analysis
Now you can do analysis with the database. For example, you can use dplyr like you usually do with dataframes.
```{r example2}
library(DBI)
library(dplyr)
con <- dbConnect(duckdb::duckdb(), dbdir = "mydata.duckdb")tbl(con, "tweets") %>% count(user_username, sort = TRUE)
```Most retweeted original content that is not from @icahdq
```{r example3}
tbl(con, "tweets") %>% arrange(desc(retweet_count)) %>% filter(is.na(sourcetweet_id) & user_username != "icahdq") %>% select(user_username, text, retweet_count)
```Calculate average retweets per original tweet by user who wrote at least three tweets
```{r example4}
tbl(con, "tweets") %>% filter(is.na(sourcetweet_id)) %>% group_by(user_username) %>% summarise(avg_rt = sum(retweet_count, na.rm = TRUE) / n(), n = n()) %>% filter(n > 2) %>% arrange(desc(avg_rt))
```You can find more information about this in the ["Introduction to dbplyr"](https://dbplyr.tidyverse.org/articles/dbplyr.html) Vignette of tidyverse
```{r, include = FALSE}
DBI::dbDisconnect(con, shutdown = TRUE)
unlink("mydata.duckdb")
```