Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/miraculixx/simplescrapy
simple scrapy test
https://github.com/miraculixx/simplescrapy
Last synced: 24 days ago
JSON representation
simple scrapy test
- Host: GitHub
- URL: https://github.com/miraculixx/simplescrapy
- Owner: miraculixx
- Created: 2016-05-17T23:39:39.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-05-18T16:22:10.000Z (over 8 years ago)
- Last Synced: 2024-10-29T00:28:05.423Z (2 months ago)
- Language: Python
- Size: 13.7 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# simplescrapy
simple scrapy testthis tests a scripted mass crawler
## setup
1. Scrapy
```
$ pip install -r requirements.txt
$ cd simple
# should show the test spider if all is ok
$ scrapy list
test
```2. Local nginx
setup a local nginx using this server directive:
```
server {
listen localhost:5151;
location / {
return 200 '{ "status" : "success" }';
}
}
```## Test
1. Run a spider test with a direct crawl
```
$ scrapy crawl test
```should work just fine (no errors)
2. Run a test with say 100 spiders launched
```
$ scrapy runscript test -n 100
Started 60 crawlers
Started 70 crawlers
Started 80 crawlers
Started 90 crawlers
Started 99 crawlers
Starting actual crawl...
done.
```3. Run a test with say 1500 spiders launched. This will start to fail,
see errors below```
$ scrapy runscript test -n 1500
(...)
2016-05-18 01:55:26 [scrapy] ERROR: Error downloading : DNS lookup failed: address 'localhost' not found: [Errno 11] Resource temporarily unavailable.
DNSLookupError: DNS lookup failed: address 'localhost' not found: [Errno 11] Resource temporarily unavailable.
```