https://github.com/coatless/bigquery-reddit-ask-your-advisor

Analysis code that counts instances of a phrase on Reddit (e.g. "ask your advisor")
https://github.com/coatless/bigquery-reddit-ask-your-advisor

ask-your-advisor bigquery r reddit

Last synced: about 2 months ago
JSON representation

Analysis code that counts instances of a phrase on Reddit (e.g. "ask your advisor")

Host: GitHub
URL: https://github.com/coatless/bigquery-reddit-ask-your-advisor
Owner: coatless
Created: 2018-11-14T14:33:32.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-11-14T16:57:46.000Z (over 7 years ago)
Last Synced: 2025-12-13T09:54:45.901Z (6 months ago)
Topics: ask-your-advisor, bigquery, r, reddit
Homepage:
Size: 30.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

          ---

title: 'Overview of Reddit Users Using Phrase'

author: "JJB"

date: "11/14/2018"

output: md_document

---

# Obtain Data

Run on [Google BigQuery](https://console.cloud.google.com/bigquery).

```sql

SELECT author, body

FROM `fh-bigquery.reddit_comments.20*`

WHERE subreddit = 'UIUC'

AND REGEXP_CONTAINS(body, r'(?i)ask your advisor')

```

Save data as a CSV. The data used for this analysis can be found in the [`data/`](data/) folder.

# Post processing

Once we have the data, let's bring it into _R_.

```{r}

# Load in the Data

comments = read.csv("data/results-20181114-075107.csv")

```

## User Overview

Let's quickly take a peak at the different contributors using this phrase.

```{r}

# Figure out user counts

count_users = table(comments$author)

# Number of Unique Users

n_users_unique = length(count_users)

# Obtain a leaderboard of comments.

top_users = sort(count_users, decreasing = TRUE)

# Get name

top_username = names(top_users)[1]

top_user_posts =  top_users[1]

```

```{r echo = FALSE}

format_reddit_username = function(user) {

  paste0("[/u/", user,"](https://reddit.com/u/", user, ")")

}

```

There were **`r n_users_unique`** of users who used a variation of the phrase

"Ask your advisor". The user with the highest amount of comments was 

**`r format_reddit_username(top_username)`** who had **`r top_user_posts`**.

All users with at least 4 posts containing the phrase are listed next in a

descending order.

```{r, echo = FALSE}

cream_of_crop = as.data.frame(top_users[top_users >= 4])

names(cream_of_crop) = c("Username", "Frequency")

# Generate knitr table

knitr::kable(cream_of_crop)

```

## Amount of Words Used Per Post

```{r message=FALSE}

# Figure out user counts

library("dplyr")

comments %>%

  group_by(author) %>%

  summarise(mean_nwords = mean(stringr::str_count(body, "\\S+")),

            n_entries = n()) %>%

  arrange(desc(n_entries), desc(mean_nwords)) %>%

  filter(n_entries >= 4) %>%

  knitr::kable()

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/coatless/bigquery-reddit-ask-your-advisor

Awesome Lists containing this project

README