Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/chainsawriot/academicquacker

🦆 Convert raw data collected by academictwitteR into DuckDB
https://github.com/chainsawriot/academicquacker

Last synced: 6 days ago
JSON representation

🦆 Convert raw data collected by academictwitteR into DuckDB

Host: GitHub
URL: https://github.com/chainsawriot/academicquacker
Owner: chainsawriot
License: gpl-3.0
Archived: true
Created: 2021-07-30T16:50:56.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2022-10-04T15:16:59.000Z (over 2 years ago)
Last Synced: 2025-02-07T00:09:12.908Z (13 days ago)
Language: R
Homepage:
Size: 3.76 MB
Stars: 6
Watchers: 4
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# academicquacker

[![R-CMD-check](https://github.com/chainsawriot/academicquacker/workflows/R-CMD-check/badge.svg)](https://github.com/chainsawriot/academicquacker/actions)

If it collects tweets like a duck, binds tweets like a duck, and quacks like a duck, then it probably _is_ an academic.

The goal of this (experimental) package is to convert raw data collected by [academictwitteR](https://github.com/cjbarrie/academictwitteR) into [DuckDB](https://github.com/duckdb/duckdb) in a memory efficient manner. This package also serves as a test bed for rolling out experimental features of `bind_tweets` in `academictwitteR`.

Why DuckDB? Because it quacks... I mean, [it rocks](https://duckdb.org/docs/why_duckdb)!

Why isn't the last R capitalized? Because the developer always forgets which "r" in the word "academicquacker" to capitalize.

## Installation

You can install the development version of academicquacker with:

``` r

remotes::install_github("chainsawriot/academicquacker")

```

## Example

Suppose `dir` is a directory hosting json files collected with academictwitteR.

```{r, include = FALSE}

dir <- "tests/testdata/ica21"

```

```{r example1}

library(academicquacker)

con <- quack(dir, db = "mydata.duckdb", db_close = TRUE)

```

It won't fill up all main memory in your computer.

## Analysis

Now you can do analysis with the database. For example, you can use dplyr like you usually do with dataframes.

```{r example2}

library(DBI)

library(dplyr)

con <- dbConnect(duckdb::duckdb(), dbdir = "mydata.duckdb")

tbl(con, "tweets") %>% count(user_username, sort = TRUE)

```

Most retweeted original content that is not from @icahdq

```{r example3}

tbl(con, "tweets") %>% arrange(desc(retweet_count)) %>% filter(is.na(sourcetweet_id) & user_username != "icahdq") %>% select(user_username, text, retweet_count)

```

Calculate average retweets per original tweet by user who wrote at least three tweets

```{r example4}

tbl(con, "tweets") %>% filter(is.na(sourcetweet_id)) %>% group_by(user_username) %>% summarise(avg_rt = sum(retweet_count, na.rm = TRUE) / n(), n = n()) %>% filter(n > 2) %>% arrange(desc(avg_rt))

```

You can find more information about this in the ["Introduction to dbplyr"](https://dbplyr.tidyverse.org/articles/dbplyr.html) Vignette of tidyverse

```{r, include = FALSE}

DBI::dbDisconnect(con, shutdown = TRUE)

unlink("mydata.duckdb")

```