{"id":19620455,"url":"https://github.com/grace-mengke-hu/redditpushshiftapi","last_synced_at":"2025-06-13T12:36:42.136Z","repository":{"id":150913235,"uuid":"450250443","full_name":"grace-mengke-hu/RedditPushshiftAPI","owner":"grace-mengke-hu","description":"This package is for collecting Reddit dataset and organize the data in Mongo Database","archived":false,"fork":false,"pushed_at":"2022-02-23T21:45:45.000Z","size":29789,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-09T11:14:10.931Z","etag":null,"topics":["collection","data","reddit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/grace-mengke-hu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-20T20:37:40.000Z","updated_at":"2022-01-20T21:25:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"af20eaa8-16b5-4f2b-a7c1-169713f072be","html_url":"https://github.com/grace-mengke-hu/RedditPushshiftAPI","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grace-mengke-hu%2FRedditPushshiftAPI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grace-mengke-hu%2FRedditPushshiftAPI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grace-mengke-hu%2FRedditPushshiftAPI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grace-mengke-hu%2FRedditPushshiftAPI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/grace-mengke-hu","download_url":"https://codeload.github.com/grace-mengke-hu/RedditPushshiftAPI/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240914010,"owners_count":19877883,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collection","data","reddit"],"created_at":"2024-11-11T11:18:07.581Z","updated_at":"2025-02-26T18:41:28.825Z","avatar_url":"https://github.com/grace-mengke-hu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Collection Package With Reddit Pushshift.io API\n## Publications\nThis package is for collecting Reddit dataset for the following publications:\nThis package achieves the Reddit dataset collection sections for the following publications:\n1. Mengke Hu and Ryzen Benson and Annie T. Chen and Shu-Hong Zhu and MikeConway, *\"Determining the prevalence of cannabis, tobacco, and vaping device mentions in online communities using natural language processing\"*, Drug and Alcohol Dependence, 2021, DOI: 10.1016/j.drugalcdep.2021.109016, url: https://www.sciencedirect.com/science/article/abs/pii/S0376871621005111?via=ihub  \n2. Ryzen Benson and Mengke Hu and Annie T Chen and Shu-Hong Zhu and Mike Conway, *\"Leveraging Reddit to Explore the Evolving Trajectories of Cannabis, Tobacco, and Vaping Device Users\"*, Frontiers in Public Health, 2021\n3. Mengke Hu and Mike Conway, *\"Using Reddit data to investigate perspectives on the COVID-19 pandemic using natural language processing: a comparative study of the US, the UK, Canada and Australia\"*, prepare to submit to JMIR on Public Health, 2022\n\nThis package was written in Python 2.7. It contains following projects:\n## RedditPushshiftAPI\nThis project contains a series of crawlers to collect Reddit data from Pushshift.io API.\n## MongoDB\nThis project requires Python 2.7. It contains various scripts to manage Mongo Database.\n## CheckingMissingData\nAs Pushshift.io Reddit API sometimes failed to update the deleted sumission and comments, this project is to compare and check the two Reddit datasets from the same time period and the same subreddit but collected at the different time. \n## ExtractInfo\nThis project is to extract useful text data from Reddit subreddit datasets that match the requirements of different regular expression filters.\n \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrace-mengke-hu%2Fredditpushshiftapi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgrace-mengke-hu%2Fredditpushshiftapi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrace-mengke-hu%2Fredditpushshiftapi/lists"}