An open API service indexing awesome lists of open source software.

https://github.com/pushshift/google_bigquery


https://github.com/pushshift/google_bigquery

Last synced: 8 months ago
JSON representation

Awesome Lists containing this project

README

          

# Pushshift Google BigQuery Data Streams

## Reddit

### Dataset location

pushshift.rt_reddit

### Tables

comments

### comments Table Schema
```
Last modified Schema Total Rows Total Bytes Expiration Time Partitioning Labels
----------------- ------------------------------------- ------------ ------------- ------------ -------------------------- --------
23 May 01:14:24 |- author: string (required) 0 0 DAY (field: created_utc)
|- author_flair_text: string
|- author_flair_css_class: string
|- body: string (required)
|- created_utc: timestamp
|- subreddit: string (required)
|- subreddit_id: integer (required)
|- parent_id: string
|- link_id: string (required)
|- id: integer (required)
|- send_replies: boolean
|- stickied: boolean
|- subreddit_type: string
|- score: integer
|- gilded: integer
|- can_gild: boolean
|- controversiality: integer
|- is_submitter: boolean
|- no_follow: boolean
|- edited: timestamp
|- permalink: string
|- retrieved_on: timestamp
```

### Standard SQL Query Examples

How many comments have been made in the last minute?

SELECT count(*) FROM `pushshift.rt_reddit.comments` WHERE created_utc > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 1 MINUTE)

What are the most active subreddits over the past five minutes?

SELECT subreddit, count(*) FROM `pushshift.rt_reddit.comments` WHERE created_utc > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 5 MINUTE) GROUP BY 1 ORDER BY 2 DESC