{"id":16553044,"url":"https://github.com/stangirard/hackrsa","last_synced_at":"2026-05-06T14:36:21.713Z","repository":{"id":106362317,"uuid":"219827730","full_name":"StanGirard/HackRSA","owner":"StanGirard","description":"Hack the heck out of rsa","archived":false,"fork":false,"pushed_at":"2020-03-05T23:06:20.000Z","size":81935,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-04T18:36:38.114Z","etag":null,"topics":["batch-gcd","certificates","complexity","hacking","javascript","nodejs","parse","python","rsa","rsa-algorithm","rsa-cryptography"],"latest_commit_sha":null,"homepage":"https://primates.dev/hack-the-heck-out-of-rsa/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/StanGirard.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-05T18:54:16.000Z","updated_at":"2023-10-12T17:28:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"5ab5bb4a-0fb8-4684-bc6e-53353de4df44","html_url":"https://github.com/StanGirard/HackRSA","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/StanGirard/HackRSA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StanGirard%2FHackRSA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StanGirard%2FHackRSA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StanGirard%2FHackRSA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StanGirard%2FHackRSA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/StanGirard","download_url":"https://codeload.github.com/StanGirard/HackRSA/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StanGirard%2FHackRSA/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32698533,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-06T08:33:17.875Z","status":"ssl_error","status_checked_at":"2026-05-06T08:33:17.221Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["batch-gcd","certificates","complexity","hacking","javascript","nodejs","parse","python","rsa","rsa-algorithm","rsa-cryptography"],"created_at":"2024-10-11T19:46:52.074Z","updated_at":"2026-05-06T14:36:21.683Z","avatar_url":"https://github.com/StanGirard.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SSLCertificates\n\nThe first step is to gather millions of SSL certificates. I found two solutions that i can share, you either download it from a website or you\n\nDownload [Top10MillionWebsites](https://www.domcop.com/files/top/top10milliondomains.csv.zip) or Crawl the data from [CommonCrawl](https://commoncrawl.org/the-data/get-started/) which is 60TB of Crawled Websites. With a powerfull enough computer you could parse websites name and get millions of websites.\n\n## Get Millions of domains in seconds\n---\n\n\nWe are going to query the [Common Crawl](https://commoncrawl.org/) S3 bucket to get the list of all the domains it has crawled\n\n\n### AWS Athena\n---\n\n[Amazon Athena](https://aws.amazon.com/athena/) is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.\n\nAthena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.\n\n\n\n---\n Open the [Athena query editor](https://console.aws.amazon.com/athena/home?region=us-east-1#query).\n\n---\n Select us-east-1 as your location\n\n---\nRun the query \n```SQL\nCREATE DATABASE ccindex\n```\nThis will create a database\n\n---\nCreate a new table with the following query \n```SQL\nCREATE EXTERNAL TABLE IF NOT EXISTS ccindex (\n  url_surtkey                   STRING,\n  url                           STRING,\n  url_host_name                 STRING,\n  url_host_tld                  STRING,\n  url_host_2nd_last_part        STRING,\n  url_host_3rd_last_part        STRING,\n  url_host_4th_last_part        STRING,\n  url_host_5th_last_part        STRING,\n  url_host_registry_suffix      STRING,\n  url_host_registered_domain    STRING,\n  url_host_private_suffix       STRING,\n  url_host_private_domain       STRING,\n  url_protocol                  STRING,\n  url_port                      INT,\n  url_path                      STRING,\n  url_query                     STRING,\n  fetch_time                    TIMESTAMP,\n  fetch_status                  SMALLINT,\n  content_digest                STRING,\n  content_mime_type             STRING,\n  content_mime_detected         STRING,\n  content_charset               STRING,\n  content_languages             STRING,\n  warc_filename                 STRING,\n  warc_record_offset            INT,\n  warc_record_length            INT,\n  warc_segment                  STRING)\nPARTITIONED BY (\n  crawl                         STRING,\n  subset                        STRING)\nSTORED AS parquet\nLOCATION 's3://commoncrawl/cc-index/table/cc-main/warc/';\n```\n\nThis will map the table ccindex that your created woth the location of the S3 bucket where the data is store\n\n---\nRun \n```SQL \nMSCK REPAIR TABLE ccindex\n```\n\nNote that you need to rerun this command every month as new data is added by the commo crawl foundation.\n\n---\nRun\n```SQL\nSELECT DISTINCT(url_host_name)\nFROM \"ccindex\".\"ccindex\"\nWHERE crawl = 'CC-MAIN-2018-05'\n  AND subset = 'warc'\n  AND url_host_tld = 'no'\n```\n\nThis will return all the hostname from Norway that were crawled in May 2018\n\n![](images/AWS-ATHENA-NORWAY-EXAMPLE.png\n)\n\n```SQL\nSELECT COUNT(*) AS count,\n       url_host_registered_domain\nFROM \"ccindex\".\"ccindex\"\nWHERE crawl = 'CC-MAIN-2018-05'\n  AND subset = 'warc'\n  AND url_host_tld = 'no'\nGROUP BY  url_host_registered_domain\nHAVING (COUNT(*) \u003e= 100)\nORDER BY  count DESC\n```\nThis Query will return all the url_host_registered_domain count from Norway\n\n**Mother of all queries** -\u003e How to get 27M domains in 30seconds:\n\n```SQL\nSELECT DISTINCT(url_host_name)\nFROM \"ccindex\".\"ccindex\"\nWHERE subset = 'warc'\n```\n\n## Get the certificates\n\n### Javascript Hell\nRequesting millions of certificates isn't an easy task. As a javascript developper I never encountered such a task. It needs to be fast and efficient. I got to experience my first memory leak in Nodejs, out of memory. I also experienced low request per second (40 per seconds).\n\nYou can check index.js in the trashCode folder if you want to see the implementation on JS.\n\n### Let's Go Python \n\nI decided to move to python3 to avoid many of the headaches that I got with Nodejs. The results were not as expected (60-70) per seconds. No memory issues this time !\n\nYou can check test.py if you want to see the implementation in Python.\n\n### Go for the win\n\nPython was not good enough, Goland seemed like a good fit. Never used before but seemed like a good choice.\n\nThe program can be found in src/main.go\n\n```bash\ncat domainnames | go run main.go\n```\n\nYou need to have one domain per line\n#### V2 of main.go\n\n\u003e The V2 of main.go allows you to skip the next steps.\n\n## Process the certificates\n\nWe need to decode and parse the certificates to find the information that we need. In my case the issuer, n and e\n\n### Javascript Hell Again\n\nI might not have learned the lesson the first time but i tried again with JS. This time I had to read 4.5M files. \n\nYou can find my programs in filewalker.js, read.js and readfile.js\n\n### Python for the win\n\nI then decided to move on and use python.\nThe script decodes the certificates files from a folder and insert the corresponding values in a database.\n\n## Batch GCD \n\nNow it is time to hack !\n\nI created a python notebook MultiplyCerts to see the basic implementation and complexity of a Batch GCD implementation.\n\nI found out that the complexity of the basic implementation of Batch GCD is X^2\n\nYou can find the results in MultiplyCerts.htlm\n\n### Batch GCD Implementation\n\nIt can then be used (for example) by a project like (batch-GCD)[https://github.com/zugzwang/batchgcd] which implement the algo from (here)[https://facthacks.cr.yp.to/batchgcd.html] developed by DJ Bernstein to find weak primes.\n\n(C++)+GMP implementation of the Batch GCD algorithm, by Daniel Bernstein. This algorithm, described in How To Find Smooth Parts Of Integers, allows to compute pairwise GCDs of a list of integers in quasilinear time and memory. See e.g. factorable.net, which also provides source code.\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstangirard%2Fhackrsa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstangirard%2Fhackrsa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstangirard%2Fhackrsa/lists"}