https://github.com/metalwarrior665/website-checker-workload
Creates a workload of actor to run checks on target website
https://github.com/metalwarrior665/website-checker-workload
Last synced: 2 months ago
JSON representation
Creates a workload of actor to run checks on target website
- Host: GitHub
- URL: https://github.com/metalwarrior665/website-checker-workload
- Owner: metalwarrior665
- Created: 2020-11-30T19:48:27.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-12-07T16:34:16.000Z (over 4 years ago)
- Last Synced: 2024-12-30T18:07:28.060Z (4 months ago)
- Language: JavaScript
- Size: 32.2 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Website checker workload
Creates reasonable workloads for analyzing any website with [Website Checker](https://apify.com/lukaskrivka/website-checker) and combines the resulting data. This is the easiest way to analyze any website for compute units usage and blocking.
This actor runs a Website Checker for each proxy group and for both browser/Puppeteer and Cheerio scraper. Those checks are run in parallel with reasonable default values and the output of all checkers in combined into a single output breakdown. This gives you quite a nice idea how difficult and costly will be scraping the site with different methods and can save precious time you would spend with manual checks.
## Input
| Field | Type | Default | Description |
| ----- | ---- | ------- | ----------- |
| website | String | https://apify.com | Website URL where you want to start checking |
| runBrowser | Boolean | true | Run the checker with browser |
| runCheerio | Boolean | true | Check with Cheerio |
| proxyGroups | Array | ['auto', 'BUYPROXIES84958'] | List of proxy groups you want to test. Can be also `auto` to run with all proxies |
| maxPagesPerCheck | Number | 200 | Max pages per each check |
| runInParallel | Boolean | true | What to scrape from each page, default is "posts" the other option is "comments" |## Output
The output is saved to the default Key-Value store as `OUTPUT` record. It is a combined output from all Website Checker runs with added spent compute units.For example for input consisting of
```json
"runBrowser": true,
"runCheerio": true,
"proxyGroups": ["auto", "BUYPROXIES84958"]
```The actor will run 4 checkers with all possible combinations:
```json
{
"puppeteer/auto": {
"computeUnits": 0.45,
"pagesPerComputeUnit": 444,
"timeouted": 0,
"failedToLoadOther": 9,
"accessDenied": 0,
"recaptcha": 0,
"distilCaptcha": 24,
"statusCodes": {
"200": 3,
"401": 2,
"403": 5,
"405": 24
},
"total": 43
},
"puppeteer/BUYPROXIES84958": {
"computeUnits": 0.45,
"pagesPerComputeUnit": 444,
"timeouted": 0,
"failedToLoadOther": 9,
"accessDenied": 0,
"recaptcha": 0,
"distilCaptcha": 24,
"statusCodes": {
"200": 3,
"401": 2,
"403": 5,
"405": 24
},
"total": 43
},
"cheerio/auto": {
"computeUnits": 0.05,
"pagesPerComputeUnit": 4000,
"timeouted": 0,
"failedToLoadOther": 9,
"accessDenied": 0,
"recaptcha": 0,
"distilCaptcha": 24,
"statusCodes": {
"200": 3,
"401": 2,
"403": 5,
"405": 24
},
"total": 43
},
"cheerio/BUYPROXIES84958": {
"computeUnits": 0.05,
"pagesPerComputeUnit": 4000,
"timeouted": 0,
"failedToLoadOther": 9,
"accessDenied": 0,
"recaptcha": 0,
"distilCaptcha": 24,
"statusCodes": {
"200": 3,
"401": 2,
"403": 5,
"405": 24
},
"total": 43
},
}
```