Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/a5huynh/scrapyd-playground
Get started with scrapy and scrapyd
https://github.com/a5huynh/scrapyd-playground
Last synced: 3 months ago
JSON representation
Get started with scrapy and scrapyd
- Host: GitHub
- URL: https://github.com/a5huynh/scrapyd-playground
- Owner: a5huynh
- License: mit
- Created: 2014-11-19T22:41:33.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2015-03-03T01:45:44.000Z (almost 10 years ago)
- Last Synced: 2023-03-10T22:11:32.926Z (almost 2 years ago)
- Language: Python
- Homepage:
- Size: 116 KB
- Stars: 12
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Scrapy Playground
I created this repository to quickly get started with scrapy and scrapyd. I've
included a Dockerfile to build a container with everything required to run a
scrapyd instance.Also included is the tutorial spider as an example of how to deploy spiders,
and as an added bonus, I created a spider that has image pipeline capabilities!### Quickstart with [Fig.sh](http://fig.sh/)
To get up and running with the scrapy playground using fig
fig upThis will build the Docker container and spin up two containers. One is a
persistent data container where spiders/logs/etc. will be stored. The second is
the scrapyd server. Skip to the `Deploying the tutorial spider` section to
start scraping!### Running `scrapyd` in Docker w/o Fig
If you want to build your own container,
docker build -t scrapyd .
docker run -it -p 6800:6800 scrapydOtherwise, I have an automated build on Docker Hub that you can use,
docker run -it -p 6800:6800 a5huynh/scrapyd
### Deploying the tutorial project
First, make sure the IP address to the container is correct in the `scrapy.cfg`
file. Then you can deploy the spider to the scrapyd container using,scrapyd-deploy docker -p test
Make sure you're in the root directory of the tutorial project, where the
`scrapy.cfg` resides. You can call the project anything you want, just be sure
to correctly refer to the project name when scheduling things.You can check the status of the project/spiders and logs on the scrapyd web GUI
located at http://[docker host]:6800### Scheduling the tutorial spider
To schedule the basic tutorial spider, use the scrapyd API:
curl http://[docker host]:6800/schedule.json -d project=test -d spider=tutorial
### Scheduling the coverart spider
To schedule the more complex coverart spider:
curl http://[docker host]:6800/schedule.json -d project=test -d spider=coverart
Cover art images will be stored on the