Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sangaline/advanced-web-scraping-tutorial

The Zipru scraper developed in the Advanced Web Scraping Tutorial.
https://github.com/sangaline/advanced-web-scraping-tutorial

python scraper scrapy tutorial-code

Last synced: 28 days ago
JSON representation

The Zipru scraper developed in the Advanced Web Scraping Tutorial.

Awesome Lists containing this project

README

        

# Advanced Web Scraping Tutorial Project

*This repository is a companion to the article [Advanced Web Scraping: Bypassing captcha, "403 Forbidden," and more](http://sangaline.com/post/advanced-web-scraping-tutorial).
Please refer to the article for further details.*

This is a [scrapy](https://scrapy.org/) web scraper for the fictional Zipru torrent site.
It is designed to bypass four distinct anti-scraping mechanisms:

1. User agent filtering.
2. Obfuscated javascript redirects.
3. Captchas.
4. Header consistency checks.

The scraper is not actually functional because Zipru is not a real site.
The code, however, is otherwise complete and can easily be adapted to work on other sites.