Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/a5huynh/scrapyd-playground

Get started with scrapy and scrapyd
https://github.com/a5huynh/scrapyd-playground

Last synced: 3 months ago
JSON representation

Get started with scrapy and scrapyd

Host: GitHub
URL: https://github.com/a5huynh/scrapyd-playground
Owner: a5huynh
License: mit
Created: 2014-11-19T22:41:33.000Z (about 10 years ago)
Default Branch: master
Last Pushed: 2015-03-03T01:45:44.000Z (almost 10 years ago)
Last Synced: 2023-03-10T22:11:32.926Z (almost 2 years ago)
Language: Python
Homepage:
Size: 116 KB
Stars: 12
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## Scrapy Playground

I created this repository to quickly get started with scrapy and scrapyd. I've
included a Dockerfile to build a container with everything required to run a
scrapyd instance.

Also included is the tutorial spider as an example of how to deploy spiders,
and as an added bonus, I created a spider that has image pipeline capabilities!

### Quickstart with [Fig.sh](http://fig.sh/)

To get up and running with the scrapy playground using fig

fig up

This will build the Docker container and spin up two containers. One is a
persistent data container where spiders/logs/etc. will be stored. The second is
the scrapyd server. Skip to the `Deploying the tutorial spider` section to
start scraping!

### Running `scrapyd` in Docker w/o Fig

If you want to build your own container,

docker build -t scrapyd .
docker run -it -p 6800:6800 scrapyd

Otherwise, I have an automated build on Docker Hub that you can use,

docker run -it -p 6800:6800 a5huynh/scrapyd

### Deploying the tutorial project

First, make sure the IP address to the container is correct in the `scrapy.cfg`
file. Then you can deploy the spider to the scrapyd container using,

scrapyd-deploy docker -p test

Make sure you're in the root directory of the tutorial project, where the
`scrapy.cfg` resides. You can call the project anything you want, just be sure
to correctly refer to the project name when scheduling things.

You can check the status of the project/spiders and logs on the scrapyd web GUI
located at http://[docker host]:6800

### Scheduling the tutorial spider

To schedule the basic tutorial spider, use the scrapyd API:

curl http://[docker host]:6800/schedule.json -d project=test -d spider=tutorial

### Scheduling the coverart spider

To schedule the more complex coverart spider:

curl http://[docker host]:6800/schedule.json -d project=test -d spider=coverart

Cover art images will be stored on the