An open API service indexing awesome lists of open source software.

https://github.com/anthonysigogne/scrapy

A list of simple scrapers made with Scrapy
https://github.com/anthonysigogne/scrapy

crawler elasticsearch python scrapy spider

Last synced: 2 months ago
JSON representation

A list of simple scrapers made with Scrapy

Awesome Lists containing this project

README

          

# Scrapy
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) ![Python 3.5](https://img.shields.io/badge/python-3.5-blue.svg)

This repository contains a list of simple scrapers made with Scrapy :

- basicspider.py - A simple spider that scraps data from a list of pages :
```
$ scrapy runspider basicspider.py -a file=list_pages.txt -o data.csv
```

- inscriptspider.py - A simple spider that scraps data from a list of pages, and launched inside a Python script via CrawlerProcess :
```
$ scrapy runspider inscriptspider.py url1 url2 url3 ... urlx
```

- basicrawler.py - A simple crawler that scraps data from a list of domains :
```
$ scrapy runspider basicrawler.py -a file=list_pages.txt -o data.csv
```

- persistencespider.py - A simple spider that scraps data from a list of pages, and saves it in the Elasticsearch database running at http://localhost:9200/ :
```
$ scrapy runspider persistencespider.py -a file=list_pages.txt
```

- persistencecrawler.py - A simple crawler that scraps data from a list of domains, and saves it in the Elasticsearch database running at http://localhost:9200/ :
```
$ scrapy runspider persistencecrawler.py -a file=list_pages.txt
```

## LICENCE
MIT