https://github.com/anthonysigogne/scrapy
A list of simple scrapers made with Scrapy
https://github.com/anthonysigogne/scrapy
crawler elasticsearch python scrapy spider
Last synced: 2 months ago
JSON representation
A list of simple scrapers made with Scrapy
- Host: GitHub
- URL: https://github.com/anthonysigogne/scrapy
- Owner: AnthonySigogne
- License: mit
- Created: 2017-07-20T22:11:38.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-07-22T05:10:43.000Z (almost 9 years ago)
- Last Synced: 2025-02-28T12:07:40.691Z (over 1 year ago)
- Topics: crawler, elasticsearch, python, scrapy, spider
- Language: Python
- Homepage:
- Size: 7.81 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Scrapy
[](https://opensource.org/licenses/MIT) 
This repository contains a list of simple scrapers made with Scrapy :
- basicspider.py - A simple spider that scraps data from a list of pages :
```
$ scrapy runspider basicspider.py -a file=list_pages.txt -o data.csv
```
- inscriptspider.py - A simple spider that scraps data from a list of pages, and launched inside a Python script via CrawlerProcess :
```
$ scrapy runspider inscriptspider.py url1 url2 url3 ... urlx
```
- basicrawler.py - A simple crawler that scraps data from a list of domains :
```
$ scrapy runspider basicrawler.py -a file=list_pages.txt -o data.csv
```
- persistencespider.py - A simple spider that scraps data from a list of pages, and saves it in the Elasticsearch database running at http://localhost:9200/ :
```
$ scrapy runspider persistencespider.py -a file=list_pages.txt
```
- persistencecrawler.py - A simple crawler that scraps data from a list of domains, and saves it in the Elasticsearch database running at http://localhost:9200/ :
```
$ scrapy runspider persistencecrawler.py -a file=list_pages.txt
```
## LICENCE
MIT