Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/devidw/google-untitled-spam-spider
A spam spider which is targeting 'Untitled' spam pages from the Google search results.
https://github.com/devidw/google-untitled-spam-spider
crawler crawling crawling-algorithm crawling-python crawling-sites crawling-tool google-untitled python python3 spam spam-detection spammer untitled untitled-spam
Last synced: about 2 months ago
JSON representation
A spam spider which is targeting 'Untitled' spam pages from the Google search results.
- Host: GitHub
- URL: https://github.com/devidw/google-untitled-spam-spider
- Owner: devidw
- License: mit
- Created: 2022-02-05T17:32:45.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2022-02-05T20:24:28.000Z (almost 3 years ago)
- Last Synced: 2024-10-18T21:04:29.950Z (3 months ago)
- Topics: crawler, crawling, crawling-algorithm, crawling-python, crawling-sites, crawling-tool, google-untitled, python, python3, spam, spam-detection, spammer, untitled, untitled-spam
- Language: Python
- Homepage: https://david.wolf.gdn/i-crawled-105009-google-untitled-spam-pages-in-7-days-and-700504-more-linked-spam-pages/
- Size: 6.84 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.adoc
- License: LICENSE
Awesome Lists containing this project
README
= Google _'Untitled'_ Spam Spider
A tiny web spider that starts crawling a website and crawls as long as it can find links on those pages, which links to similar spam pages.
This spider is targeting the 'Untitled' spam pages from the Google search results.
I wrote https://david.wolf.gdn/posts/spam/google-untitled/[several articles] about those spam pages. In which I discuss the underlying backgrounds of this spam network.
[quote, David Wolf, 'https://david.wolf.gdn/i-crawled-105009-google-untitled-spam-pages-in-7-days-and-700504-other-linked-spam-pages/[david.wolf.gdn]']
I crawled 105,009 Google 'Untitled' Spam Pages in 7 days and 700,504 other linked Spam Pages== Usage
[source,python]
----
from google_spam_spider import GoogleSpamSpiderspider = GoogleSpamSpider(
url='http://zone-casino.fr/2hephe/torch-functional-unfold.html', # The url to start crawling
direct_spam_logs='direct_spam.log', # The file to log direct spam
external_spam_logs='external_spam.log' # The file to log external spam
)
----