Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/koverholt/scrapy-site-downloader

Template project for downloading a site with Scrapy
https://github.com/koverholt/scrapy-site-downloader

Last synced: 9 days ago
JSON representation

Template project for downloading a site with Scrapy

Awesome Lists containing this project

README

        

# scrapy-site-downloader

## Overview

Template project for downloading a site with Scrapy. Crawls, scrapes, and saves
HTML files from a given website, domain, and URL filters.

## Steps to run

1. Clone this repository and `cd` into it
1. Install the dependencies using the following command:
```
pip install -r requirements.txt
```
1. Configure the `crawler/spiders/site.py` file for the site you want to crawl
1. Start the downloader using the following command (be sure to run this from
the repository root!):
```
scrapy crawl site
```
1. Refer to the
[Scrapy documentation](https://docs.scrapy.org/en/latest/topics/practices.html)
for best practices and other configuration options
1. When the crawler finishes, the HTML files will be located in the `/html`
directory