{"id":27014232,"url":"https://github.com/maximebories/regexp-scraper","last_synced_at":"2026-05-01T14:32:48.648Z","repository":{"id":111817186,"uuid":"543867759","full_name":"maximebories/regexp-scraper","owner":"maximebories","description":"Advanced used of Puppeteer to scrape a web engine results against a RegExp","archived":false,"fork":false,"pushed_at":"2023-03-15T22:50:44.000Z","size":24,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-13T23:15:01.363Z","etag":null,"topics":["crawler","cyber-investigations","dorking","google-dorking","osint","phishing-sites","puppeteer","regex","regexp","scraper","scraping","search","search-engine","web-security"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maximebories.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-01T02:53:52.000Z","updated_at":"2023-08-30T04:24:36.000Z","dependencies_parsed_at":"2023-06-03T23:45:36.625Z","dependency_job_id":null,"html_url":"https://github.com/maximebories/regexp-scraper","commit_stats":null,"previous_names":["maximebories/regexp-scraper","maximebories/regex-scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/maximebories/regexp-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximebories%2Fregexp-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximebories%2Fregexp-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximebories%2Fregexp-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximebories%2Fregexp-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maximebories","download_url":"https://codeload.github.com/maximebories/regexp-scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximebories%2Fregexp-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32501399,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"online","status_checked_at":"2026-05-01T02:00:05.856Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","cyber-investigations","dorking","google-dorking","osint","phishing-sites","puppeteer","regex","regexp","scraper","scraping","search","search-engine","web-security"],"created_at":"2025-04-04T13:29:52.113Z","updated_at":"2026-05-01T14:32:48.640Z","avatar_url":"https://github.com/maximebories.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# regexp-scraper\nAdvanced use of Puppeteer to crawl Google web engine results against a RegExp. You have to provide a search query and a regular expression to match against either the Google search results page or the full page content depending on how thorough you want the search to be.\n\n##\n\nTo get a lock on node modules:\n\n\t$ npm update\n\nTo run:\n\n\t$ node main.ts\n \nThe example I used here was to find fraud phishing URLs send through text messages for further investigations, DO NOT click on any of them unless you know what you are doing.\n\n\n\n## How to use the script\n\nTo run the script, use the following command:\n\n\t$ node main.ts \u003cquery\u003e \u003cregexp\u003e \u003cfilter\u003e\n\nReplace \u003cquery\u003e with the query you want to use for the search, \u003cregexp\u003e with the regular expression you want to use to match against the page content, and \u003cfilter\u003e with 'true' if you want to filter the search results or 'false' if you don't want to filter the results.\n\nFor example, to perform a search for the query 'Votre colis a été envoyé. Veuillez le vérifier et le recevoir.' which is a common text phishing in France, disabling filtering Google similar results search, the results and using the regular expression 'http://[a-z]{5}.[a-z]{5}.[a-z]+' that capture all the URLs that are being used in this fishing operation, use the following command:\n\n\t$ node main.ts 'Votre colis a été envoyé. Veuillez le vérifier et le recevoir.' 'http://[a-z]{5}.[a-z]{5}.[a-z]+' false\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaximebories%2Fregexp-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaximebories%2Fregexp-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaximebories%2Fregexp-scraper/lists"}