{"id":15462713,"url":"https://github.com/devidw/google-untitled-spam-spider","last_synced_at":"2025-03-28T11:20:30.462Z","repository":{"id":44600461,"uuid":"455938383","full_name":"devidw/google-untitled-spam-spider","owner":"devidw","description":"A spam spider which is targeting 'Untitled' spam pages from the Google search results.","archived":false,"fork":false,"pushed_at":"2022-02-05T20:24:28.000Z","size":7,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-10-18T21:04:29.950Z","etag":null,"topics":["crawler","crawling","crawling-algorithm","crawling-python","crawling-sites","crawling-tool","google-untitled","python","python3","spam","spam-detection","spammer","untitled","untitled-spam"],"latest_commit_sha":null,"homepage":"https://david.wolf.gdn/i-crawled-105009-google-untitled-spam-pages-in-7-days-and-700504-more-linked-spam-pages/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devidw.png","metadata":{"files":{"readme":"README.adoc","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-02-05T17:32:45.000Z","updated_at":"2022-02-27T07:29:13.000Z","dependencies_parsed_at":"2022-09-03T05:11:35.126Z","dependency_job_id":null,"html_url":"https://github.com/devidw/google-untitled-spam-spider","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devidw%2Fgoogle-untitled-spam-spider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devidw%2Fgoogle-untitled-spam-spider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devidw%2Fgoogle-untitled-spam-spider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devidw%2Fgoogle-untitled-spam-spider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devidw","download_url":"https://codeload.github.com/devidw/google-untitled-spam-spider/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246017725,"owners_count":20710240,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","crawling","crawling-algorithm","crawling-python","crawling-sites","crawling-tool","google-untitled","python","python3","spam","spam-detection","spammer","untitled","untitled-spam"],"created_at":"2024-10-02T00:03:46.618Z","updated_at":"2025-03-28T11:20:30.423Z","avatar_url":"https://github.com/devidw.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"= Google _'Untitled'_ Spam Spider\n\nA tiny web spider that starts crawling a website and crawls as long as it can find links on those pages, which links to similar spam pages.\n\nThis spider is targeting the 'Untitled' spam pages from the Google search results.\n\nI wrote https://david.wolf.gdn/posts/spam/google-untitled/[several articles] about those spam pages. In which I discuss the underlying backgrounds of this spam network.\n\n[quote, David Wolf, 'https://david.wolf.gdn/i-crawled-105009-google-untitled-spam-pages-in-7-days-and-700504-other-linked-spam-pages/[david.wolf.gdn]']\nI crawled 105,009 Google 'Untitled' Spam Pages in 7 days and 700,504 other linked Spam Pages\n\n== Usage\n\n[source,python]\n----\nfrom google_spam_spider import GoogleSpamSpider\n\nspider = GoogleSpamSpider(\n    url='http://zone-casino.fr/2hephe/torch-functional-unfold.html', # The url to start crawling\n    direct_spam_logs='direct_spam.log', # The file to log direct spam\n    external_spam_logs='external_spam.log' # The file to log external spam\n    )\n----\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevidw%2Fgoogle-untitled-spam-spider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevidw%2Fgoogle-untitled-spam-spider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevidw%2Fgoogle-untitled-spam-spider/lists"}