{"id":13618169,"url":"https://github.com/EasyPi/docker-scrapyd","last_synced_at":"2025-04-14T10:31:09.446Z","repository":{"id":43959369,"uuid":"436567122","full_name":"EasyPi/docker-scrapyd","owner":"EasyPi","description":"🕷️ Scrapyd is an application for deploying and running Scrapy spiders.","archived":false,"fork":false,"pushed_at":"2025-04-12T03:18:04.000Z","size":59,"stargazers_count":83,"open_issues_count":0,"forks_count":22,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-12T04:25:01.366Z","etag":null,"topics":["docker","scrapy","scrapyd"],"latest_commit_sha":null,"homepage":"","language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EasyPi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-12-09T10:04:38.000Z","updated_at":"2025-04-12T03:18:08.000Z","dependencies_parsed_at":"2024-01-27T15:27:26.145Z","dependency_job_id":"c8654a57-6b5b-49c3-b9c3-c1c302a6b396","html_url":"https://github.com/EasyPi/docker-scrapyd","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EasyPi%2Fdocker-scrapyd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EasyPi%2Fdocker-scrapyd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EasyPi%2Fdocker-scrapyd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EasyPi%2Fdocker-scrapyd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EasyPi","download_url":"https://codeload.github.com/EasyPi/docker-scrapyd/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248862598,"owners_count":21173837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","scrapy","scrapyd"],"created_at":"2024-08-01T20:01:55.621Z","updated_at":"2025-04-14T10:31:09.435Z","avatar_url":"https://github.com/EasyPi.png","language":"Dockerfile","funding_links":[],"categories":["HarmonyOS","Dockerfile"],"sub_categories":["Windows Manager"],"readme":"scrapyd\n=======\n\n[![](https://github.com/easypi/docker-scrapyd/actions/workflows/build.yaml/badge.svg)](https://github.com/EasyPi/docker-scrapyd)\n\n[![](http://dockeri.co/image/easypi/scrapyd)](https://hub.docker.com/r/easypi/scrapyd)\n\n[scrapy][1] is an open source and collaborative framework for extracting the\ndata you need from websites. In a fast, simple, yet extensible way.\n\n[scrapyd][2] is a service for running Scrapy spiders.  It allows you to deploy\nyour Scrapy projects and control their spiders using a HTTP JSON API.\n\n[scrapyd-client][3] is a client for scrapyd. It provides the scrapyd-deploy\nutility which allows you to deploy your project to a Scrapyd server.\n\n[scrapy-splash][4] provides Scrapy+JavaScript integration using Splash.\n\n[scrapyrt][5] allows you to easily add HTTP API to your existing Scrapy project.\n\n[spidermon][6] is a framework to build monitors for Scrapy spiders.\n\n[scrapy-poet][7] is the web-poet Page Object pattern implementation for Scrapy.\n\n[scrapy-playwright][8] is a Scrapy Download Handler which performs requests using Playwright for Python.\n\nThis image is based on `debian:bookworm`, 8 latest stable python packages are installed:\n\n- scrapy==2.12.0\n- scrapyd==1.5.0\n- scrapyd-client==2.0.2\n- scrapy-splash==0.11.1\n- scrapyrt==v0.16.0\n- spidermon==1.24.0\n- scrapy-poet==0.26.0\n- scrapy-playwright==v0.0.43\n\n```bash\n# fetch latest versions\necho \"scrapy scrapyd scrapyd-client scrapy-splash scrapyrt spidermon scrapy-poet scrapy-playwright\" |\n  xargs -n1 pip --disable-pip-version-check index versions 2\u003e/dev/null |\n    grep -v Available\n```\n\nPlease use this as base image for your own project.\n\n:warning: Scrapy (since [2.0.0][9]) has dropped support for Python 2.7, which reached end-of-life on 2020-01-01.\n\n## docker-compose.yml\n\n```yaml\nversion: \"3.8\"\n\nservices:\n\n  scrapyd:\n    image: easypi/scrapyd\n    ports:\n      - \"6800:6800\"\n    volumes:\n      - ./data:/var/lib/scrapyd\n      - /usr/local/lib/python3.11/dist-packages\n    restart: unless-stopped\n\n  scrapy:\n    image: easypi/scrapyd\n    command: bash\n    volumes:\n      - .:/code\n    working_dir: /code\n    restart: unless-stopped\n\n  scrapyrt:\n    image: easypi/scrapyd\n    command: scrapyrt -i 0.0.0.0 -p 9080\n    ports:\n      - \"9080:9080\"\n    volumes:\n      - .:/code\n    working_dir: /code\n    restart: unless-stopped\n```\n\n## Run it as background-daemon for scrapyd\n\n```bash\n$ docker-compose up -d scrapyd\n$ docker-compose logs -f scrapyd\n$ docker cp scrapyd_scrapyd_1:/var/lib/scrapyd/items .\n$ tree items\n└── myproject\n    └── myspider\n        └── ad6153ee5b0711e68bc70242ac110005.jl\n```\n\n```bash\n$ mkvirtualenv -p python3 webbot\n$ pip install scrapy scrapyd-client\n\n$ scrapy startproject myproject\n$ cd myproject\n$ setvirtualenvproject\n\n$ scrapy genspider myspider mydomain.com\n$ scrapy edit myspider\n$ scrapy list\n\n$ vi scrapy.cfg\n$ scrapyd-client deploy\n$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=myspider\n$ curl http://localhost:6800/daemonstatus.json\n$ firefox http://localhost:6800\n```\n\nFile: scrapy.cfg\n\n```ini\n[settings]\ndefault = myproject.settings\n\n[deploy]\nurl = http://localhost:6800/\nproject = myproject\n```\n\n## Run it as interactive-shell for scrapy\n\n```bash\n$ cat \u003e stackoverflow_spider.py \u003c\u003c _EOF_\nimport scrapy\n\nclass StackOverflowSpider(scrapy.Spider):\n    name = 'stackoverflow'\n    start_urls = ['http://stackoverflow.com/questions?sort=votes']\n\n    def parse(self, response):\n        for href in response.css('.question-summary h3 a::attr(href)'):\n            full_url = response.urljoin(href.extract())\n            yield scrapy.Request(full_url, callback=self.parse_question)\n\n    def parse_question(self, response):\n        yield {\n            'title': response.css('h1 a::text').extract()[0],\n            'votes': response.css('.question div[itemprop=\"upvoteCount\"]::text').extract()[0],\n            'body': response.css('.question .postcell').extract()[0],\n            'tags': response.css('.question .post-tag::text').extract(),\n            'link': response.url,\n        }\n_EOF_\n\n$ docker-compose run --rm scrapy\n\u003e\u003e\u003e scrapy runspider stackoverflow_spider.py -o top-stackoverflow-questions.jl\n\u003e\u003e\u003e cat top-stackoverflow-questions.jl\n\u003e\u003e\u003e exit\n```\n\n## Run it as realtime crawler for scrapyrt\n\n```bash\n$ git clone https://github.com/scrapy/quotesbot.git .\n$ docker-compose up -d scrapyrt\n$ curl -s 'http://localhost:9080/crawl.json?spider_name=toscrape-css\u0026callback=parse\u0026url=http://quotes.toscrape.com/\u0026max_requests=5' | jq -c '.items[]'\n```\n\n[1]: https://github.com/scrapy/scrapy\n[2]: https://github.com/scrapy/scrapyd\n[3]: https://github.com/scrapy/scrapyd-client\n[4]: https://github.com/scrapinghub/scrapy-splash\n[5]: https://github.com/scrapinghub/scrapyrt\n[6]: https://github.com/scrapinghub/spidermon\n[7]: https://github.com/scrapinghub/scrapy-poet\n[8]: https://github.com/scrapy-plugins/scrapy-playwright\n[9]: \u003chttps://docs.scrapy.org/en/latest/news.html#scrapy-2-0-0-2020-03-03\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEasyPi%2Fdocker-scrapyd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FEasyPi%2Fdocker-scrapyd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEasyPi%2Fdocker-scrapyd/lists"}