https://github.com/dgrtwo/stackbigquery
Database-specific package for the Stack Overflow data on Google BigQuery
https://github.com/dgrtwo/stackbigquery
Last synced: 7 months ago
JSON representation
Database-specific package for the Stack Overflow data on Google BigQuery
- Host: GitHub
- URL: https://github.com/dgrtwo/stackbigquery
- Owner: dgrtwo
- License: other
- Created: 2021-09-10T14:59:56.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-09-10T15:01:33.000Z (over 4 years ago)
- Last Synced: 2025-04-05T05:13:35.077Z (about 1 year ago)
- Language: R
- Size: 93.8 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
cache = TRUE,
cache.path = "README-cache/"
)
```
# stackbigquery
stackbigquery is a package wrapping the Stack Overflow database on Google BigQuery.
This is a minimal example of using [dbcooper](https://github.com/dgrtwo/dbcooper) to create a database package:
* Create a connection in [connections.R](https://github.com/dgrtwo/stackbigquery/blob/master/R/connections.R)
* Run `dbcooper::dbc_init()` on that connection in [zzz.R](https://github.com/dgrtwo/stackbigquery/blob/master/R/zzz.R)
* Put package-specific functions in other files like [summarize.R](https://github.com/dgrtwo/stackbigquery/blob/master/R/summarize.R)
## Installation
You can install the development version of stackbigquery from GitHub with:
``` r
devtools::install_github("dgrtwo/stackbigquery")
```
You'll also need to create a Google Cloud project with BigQuery enabled, and set two environment variables in your `.Renviron` file (see [bigrquery](https://bigrquery.r-dbi.org/)).
```
BIGQUERY_BILLING_PROJECT=
BIGQUERY_EMAIL=
```
The first time you use the package, it may prompt you to authenticate (see the [gargle](https://gargle.r-lib.org/) package for more).
## Examples
Once you've loaded the stackbigquery package, you can use functions prefixed with `stack_` to access the database. This includes
* `stack_list()` to list tables in the database
* `stack_query()` to run a SQL query (and get a remote dbplyr table)
```{r posts_questions}
library(dplyr)
library(stackbigquery)
stack_list()
stack_query("SELECT * FROM tags ORDER BY count DESC")
```
You can also use autocomplete-friendly table accessors:
```{r}
stack_posts_questions()
```
These can be used with dbplyr to do joins or summaries.
```{r by_month}
by_month <- stack_posts_questions() %>%
group_by(month = DATE_TRUNC(DATE(creation_date), MONTH)) %>%
summarize(n_questions = n(),
avg_score = mean(score),
avg_answers = mean(answer_count)) %>%
collect()
by_month
```
```{r}
library(ggplot2)
theme_set(theme_light())
by_month %>%
filter(n_questions >= 100) %>%
ggplot(aes(month, avg_score)) +
geom_line() +
labs(y = "Average score of Stack Overflow questions")
```
### Summarize tags
As a database-specific package, stackbigquery also offers useful verbs for doing common operations on the data.
For instance, `summarize_tags` takes a (potentially grouped) version of `stack_posts_questions`, joins it to the tags table, and aggregates the frequency by tag.
```{r by_month_tag}
by_month_tag <- stack_posts_questions() %>%
group_by(month = DATE_TRUNC(DATE(creation_date), MONTH)) %>%
summarize_tags(c("javascript", "java", "python", "c#", "php", "c++"))
by_month_tag
```
```{r by_month_tag_plot, dependson = "by_month_tag"}
library(ggplot2)
library(forcats)
by_month_tag %>%
filter(month != max(month),
month != min(month)) %>%
arrange(month) %>%
mutate(tag = fct_reorder(tag, -percent, last)) %>%
ggplot(aes(month, percent, color = tag)) +
geom_line() +
scale_y_continuous(labels = scales::percent_format()) +
expand_limits(y = 0) +
labs(x = "Time",
y = "% of Stack Overflow questions")
```
### Code of Conduct
Please note that the 'stackbigquery' project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms.