{"id":22211408,"url":"https://github.com/almayor/reddit-mods-dataset","last_synced_at":"2025-10-13T04:31:15.833Z","repository":{"id":221241672,"uuid":"753318593","full_name":"almayor/reddit-mods-dataset","owner":"almayor","description":"A dataset of 25'834 largest communities on Reddit and their (anonymised) moderators.","archived":false,"fork":false,"pushed_at":"2024-10-11T11:55:48.000Z","size":38463,"stargazers_count":3,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-12-02T20:48:05.226Z","etag":null,"topics":["dataset","graph","graph-algorithms","reddit","social-network-analysis"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/almayor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"ko_fi":"almayor"}},"created_at":"2024-02-05T22:09:00.000Z","updated_at":"2024-08-24T17:55:40.000Z","dependencies_parsed_at":"2024-12-02T20:45:02.640Z","dependency_job_id":null,"html_url":"https://github.com/almayor/reddit-mods-dataset","commit_stats":null,"previous_names":["almayor/reddit-mods-dataset"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/almayor%2Freddit-mods-dataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/almayor%2Freddit-mods-dataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/almayor%2Freddit-mods-dataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/almayor%2Freddit-mods-dataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/almayor","download_url":"https://codeload.github.com/almayor/reddit-mods-dataset/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236301051,"owners_count":19126959,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","graph","graph-algorithms","reddit","social-network-analysis"],"created_at":"2024-12-02T20:31:36.167Z","updated_at":"2025-10-13T04:31:06.406Z","avatar_url":"https://github.com/almayor.png","language":"Jupyter Notebook","funding_links":["https://ko-fi.com/almayor"],"categories":[],"sub_categories":[],"readme":"# RedditMods: Moderators of top-25'000 subreddits\n\u003ca href=\"https://www.kaggle.com/datasets/gingerbadger/redditmods-moderators-of-top-25k-subreddits/data\" rel=\"Kaggle dataset\"\u003e![Kaggle](https://img.shields.io/badge/Kaggle-035a7d?style=for-the-badge\u0026logo=kaggle\u0026logoColor=white)\u003c/a\u003e ![Reddit](https://img.shields.io/badge/Reddit-%23FF4500.svg?style=for-the-badge\u0026logo=Reddit\u0026logoColor=white)\n\n_RedditMods_ is a dataset that lists moderators of 25'834 largest and most popular communities on Reddit. The dataset is ideal for studying Reddit as a bipartite graph, where a moderator-node and a community-node are connected if the corresponding user moderates this subreddit. Clustering can then be performed to identify groups of subredits with a particular leaning, or to recommend similar communities.\n\n## Data Collection\n\nThe data was scraped in the associated [Jupyter Notebook](code/reddit-mods-ds.ipynb). The data was publicly available and collected on 06 Feb 2024. All usernames were anonymised by hashing with SHA256, so that they cannot be linked to the moderators' Reddit accounts.\n\n## Description of Files\n\nThe data is available both as a table and a bipartite graph.\n\n#### GEXF – data in graph format\n\n1. `graph.gexf`\n\n\tA bipartite graph, where nodes in the first group (having attribute `bipartite=0`) are moderators and nodes in the second group (having attribute `bipartite=1`) are subreddits. A moderator-node is connected with a subreddit-node if that moderator moderates this subreddit.\n\t\n\tTags:\n\t* `size` on subreddit-nodes, indicating the number of subreddit's members\n\t  \n\t\u003chr\u003e\n\t\n#### CSV – data in table format\n\n1. `subreddits.csv`\n\n\tContains 25K subreddits from [Reddit's Top](www.reddit.com/best/communities/1/), combined with the [list](http://www.reddit.com/subreddits/) of Reddit's most popular communities. The two lists are not identical, as described in the [Jupyter notebook](code/reddit-mods-ds.ipynb). The headers are:\n\n\t* `name`: Name of subreddit\n\t* `n_members`: Number of members\n\t\n\t\u003chr\u003e\n\t\n2. `moderators.csv`\n\n\tEach row describes a subreddit-moderator pair:\n\t\n\t* `subreddit`: Name of subreddit\n\t* `moderator`: Username of moderator (anonymised by hashing)\n\t\n\t\u003chr\u003e\n\n3. `bots.csv`\n\tList of moderators that were identified as bots  by the primitive procedure, described in the previous section. These accounts were already removed from `moderators.csv`.\n\t\n\t* `name`: Username of bot\n\n\t\u003chr\u003e\n\n## Examples\n\n* [Visualising a cluster of subreddits moderated by a group of users](./example/example.ipynb)\n\n\n\u003cp float=\"left\"\u003e\n  \u003cimg src=\"example/example-bipartite-india.png\" width=\"500\" /\u003e\n  \u003cimg src=\"example/example-projected.png\" width=\"500\" /\u003e \n\u003c/p\u003e\n\n## Notes and warnings\n\nI used a very simple procedure to filter out auto-moderators: (1) a short list of known bots (e.g. `u/AutoModerator`), (2) username starts or ends with `bot`. An additional procedure to identify and remove bots might be necessary. For an example, see [this notebook](example/example.ipynb).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falmayor%2Freddit-mods-dataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falmayor%2Freddit-mods-dataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falmayor%2Freddit-mods-dataset/lists"}