Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tcurvelo/zyte-utils
Yet another bunch of cli utilities for Zyte's Scrapy Cloud.
https://github.com/tcurvelo/zyte-utils
hacktoberfest scrapy-cloud zyte
Last synced: about 2 months ago
JSON representation
Yet another bunch of cli utilities for Zyte's Scrapy Cloud.
- Host: GitHub
- URL: https://github.com/tcurvelo/zyte-utils
- Owner: tcurvelo
- Created: 2022-03-17T00:23:49.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-05-03T02:27:25.000Z (over 2 years ago)
- Last Synced: 2023-08-02T20:25:55.308Z (over 1 year ago)
- Topics: hacktoberfest, scrapy-cloud, zyte
- Language: Python
- Homepage:
- Size: 12.7 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Zyte Utils
Yet another bunch of cli utilities for Zyte's Scrapy Cloud.
## Scripts
### `stats-per-spider`
Collect a given _stat_ from all projects listed in `scrapinghub.yml`, and group them by spider:
❯ stats-per-spider "links/pages" \
--shub-file="my_awesome_project/scrapinghub.yml" \
--start 2022-01-01 --end 2022-02-01 \
--output="page_count_jan22.csv"It can be specially useful to keep track of custom stats along the time, such as product categories, pagination requests, etc.
### `requests-per-spider`
Do the same than before, but count for chargeable SPM/Crawlera responses.
❯ requests-per-spider --start 2022-01-01 --output="requests_feb.csv"
It counts all responses with status not in `[403, 407, 408, 429, 502, 503, 504, 999]`
### `spiders-health`
Compile the last run status for all spiders, from the projects listed in `scrapinghub.yml`.
❯ spiders-health --shub-file="my_awesome_project/scrapinghub.yml"
spider project id last_run last_item_count last_error_count
bar.org 1234 137 2022-04-30 11:34:50+00:00 18.0 -
baz.com 5678 47 2021-06-10 14:57:54+00:00 - 47.0
foo.com 1234 138 2022-04-30 05:56:46+00:00 - 1.0