{"id":17573693,"url":"https://github.com/huitema/socspider","last_synced_at":"2025-09-17T11:48:39.520Z","repository":{"id":83470612,"uuid":"582455706","full_name":"huitema/socspider","owner":"huitema","description":"Social spider for Mastodon and the fediverse, implemented in Python","archived":false,"fork":false,"pushed_at":"2023-01-02T02:34:00.000Z","size":35,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-08-29T09:23:29.061Z","etag":null,"topics":["mastodon","mastodon-api","mastodon-discovery"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/huitema.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-12-26T22:25:53.000Z","updated_at":"2024-05-31T01:13:20.000Z","dependencies_parsed_at":"2023-07-03T17:00:22.124Z","dependency_job_id":null,"html_url":"https://github.com/huitema/socspider","commit_stats":{"total_commits":11,"total_committers":1,"mean_commits":11.0,"dds":0.0,"last_synced_commit":"e4566c690189400da4f2279ffb31f230ce3357f7"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/huitema/socspider","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huitema%2Fsocspider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huitema%2Fsocspider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huitema%2Fsocspider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huitema%2Fsocspider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/huitema","download_url":"https://codeload.github.com/huitema/socspider/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huitema%2Fsocspider/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273746599,"owners_count":25160643,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-05T02:00:09.113Z","response_time":402,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mastodon","mastodon-api","mastodon-discovery"],"created_at":"2024-10-21T21:04:22.901Z","updated_at":"2025-09-17T11:48:39.477Z","avatar_url":"https://github.com/huitema.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# socspider\n\nThis toy python program is demonstrating how to build a \"social spider\" using the public API of Mastodon.\nThe program will start by reading toots in the public timeline of a \"start\" instance, by default\n`mastodon.social`. It analyzes the toots to find names of instances and handles of users in the Fediverse,\nand an approximation of the social graph, by recording for each discovered user the handles of the\nusers by which it is seen.\n\nThe program is not fast. It takes about 1 minute on a laptop to learn the profiles of 100 users. It could\ntake 2 months to process 10 million Mastodon users. But then, the program is not optimized at\nall. A lot of time is spent waiting for responses of remote servers. This could be reduced\nby running several queries in parallel, in multiple threads. Running on a big 256 core server,\nthe 10 million accounts mentioned above could be parsed in about 6 hours. Running on a cluster\nof machines would be faster still.\n\nThe point here is not speed. The point is to demonstrate the power of public API like\n\"reading the public timeline\", \"reading the data of a toot\", \"reading a thread starting\nwith a toot\", \"reading who favorited a toot\"\" or \"reading the public messages sent by an\naccount\". In the Mastodon implementation, these APIs are public. (The same APIs appear to be\naccess controlled in servers running Pleroma.)\n\nThe power of the API could be used for good or for bad. For example, the spider could be augmented to\nalso collect hash-tags read by users, or assign weights to the relations between users.\nOn the good side, this would enable building catalog of servers or directories of users,\nor to add a search function to the Fediverse. On the bad side, this is exactly the kind\nof data required for \"serving better ads\", or to find targets of harassment.\n\n## Using the spider\n\nTo use the spider, you need to clone this depot, then run:\n```\npython3 socspider.py \u003cname-of-afile\u003e [start-instance-url]\n```\nThe spidering will start at the designated instance, and will troll the fediverse\nuntil it has learned at least 100 new user handles. The data will be saved in\nJSON format in the designated file. \n\nYou can run the program several time. If the data file already exists when the program\nis launched, it will be loaded in memory, and the results of the spidering added to\nthe existing data.\n\n## Participating\n\nIf you want to improve this code or otherwise comment on it, feel free to open\nan issue of propose a PR here. Or, contact \"huitema@social.secret-wg.org\" on Mastodon.\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuitema%2Fsocspider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhuitema%2Fsocspider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuitema%2Fsocspider/lists"}