{"id":19684241,"url":"https://github.com/elliotwutingfeng/twitter200m","last_synced_at":"2026-03-16T05:36:41.923Z","repository":{"id":104337013,"uuid":"587173285","full_name":"elliotwutingfeng/Twitter200M","owner":"elliotwutingfeng","description":"Simple analysis of the Twitter 200M Data Dump of January 2023.","archived":false,"fork":false,"pushed_at":"2026-01-13T16:06:52.000Z","size":275,"stargazers_count":13,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-13T18:34:14.268Z","etag":null,"topics":["200m","data-science","haveibeenpwned","leak","osint","twitter"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elliotwutingfeng.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-01-10T06:01:52.000Z","updated_at":"2026-01-13T16:06:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"a11985db-01da-45b5-bdb2-55658b503a61","html_url":"https://github.com/elliotwutingfeng/Twitter200M","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/elliotwutingfeng/Twitter200M","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2FTwitter200M","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2FTwitter200M/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2FTwitter200M/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2FTwitter200M/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elliotwutingfeng","download_url":"https://codeload.github.com/elliotwutingfeng/Twitter200M/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2FTwitter200M/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30568134,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-16T04:42:47.996Z","status":"ssl_error","status_checked_at":"2026-03-16T04:42:44.668Z","response_time":96,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["200m","data-science","haveibeenpwned","leak","osint","twitter"],"created_at":"2024-11-11T18:17:12.676Z","updated_at":"2026-03-16T05:36:41.917Z","avatar_url":"https://github.com/elliotwutingfeng.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Twitter200M\n\n[![License](https://img.shields.io/badge/LICENSE-BSD--3--CLAUSE-GREEN?style=for-the-badge)](LICENSE)\n\nSimple analysis of the [Twitter 200M Data Dump](https://haveibeenpwned.com/PwnedWebsites#Twitter200M) of January 2023.\n\nDownload links for the data dump are **not** included in this repository.\n\n## Background\n\nQuote from haveibeenpwned.com,\n\n\u003e In early 2023, over 200M records scraped from Twitter appeared on a popular hacking forum. The data was obtained sometime in 2021 by abusing an API that enabled email addresses to be resolved to Twitter profiles. The subsequent results were then composed into a corpus of data containing email addresses alongside public Twitter profile information including names, usernames and follower counts.\n\nThe data dump analysed in this repository is a \"cleaned-up\" version by a user on the aforementioned forum.\n\n## Findings\n\n### Caveats\n\n- Not all user accounts have been leaked; Twitter has much more than 200 million accounts.\n- It is impossible to verify that the leaked datasets have not been tampered with falsified data.\n\nThe following findings are made on the assumption that this dataset is representative of Twitter's actual userbase.\n\n### Most popular email providers\n\n```bash\n┌────────────────┬─────────────────┐\n│ Email Provider ┆ Number of Users │\n│ ---            ┆ ---             │\n│ str            ┆ i64             │\n╞════════════════╪═════════════════╡\n│ gmail.com      ┆ 73314131        │\n│ hotmail.com    ┆ 40509492        │\n│ yahoo.com      ┆ 33051713        │\n│ aol.com        ┆ 4025882         │\n│ hotmail.co.uk  ┆ 3298152         │\n│ mail.ru        ┆ 3289923         │\n│ hotmail.fr     ┆ 3128568         │\n│ live.com       ┆ 1945940         │\n│ msn.com        ┆ 1321923         │\n│ yahoo.co.uk    ┆ 1313553         │\n│ yahoo.fr       ┆ 1245996         │\n│ ymail.com      ┆ 1142144         │\n│ yandex.ru      ┆ 1125810         │\n│ icloud.com     ┆ 1093533         │\n│ comcast.net    ┆ 1091726         │\n└────────────────┴─────────────────┘\n```\n\nOver **75%** of Twitter users use either Google, Microsoft, or Yahoo email addresses.\n\n### Account creation times\n\nTwitter first experienced rapid user growth in 2009, with its highest new account signup rates from 2011 to 2013.\n\nFrom 2016 onwards, new account signups dipped below 2009 levels, and have been on a constant decline ever since.\n\n## Requirements\n\nTested on Linux x64\n\n- Fast multicore CPU\n- At least 16 GB available RAM\n- Python 3.14\n- [uv](https://docs.astral.sh/uv)\n- [7zip](https://7-zip.org)\n\n## Run Jupyter Notebook\n\n```shell\nuv run jupyter notebook main.ipynb\n```\n\n## Formatting\n\n```shell\nuv run ruff format .\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felliotwutingfeng%2Ftwitter200m","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felliotwutingfeng%2Ftwitter200m","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felliotwutingfeng%2Ftwitter200m/lists"}