{"id":22828688,"url":"https://github.com/busterc/crwlr","last_synced_at":"2025-04-23T16:24:45.820Z","repository":{"id":44371171,"uuid":"143338191","full_name":"busterc/crwlr","owner":"busterc","description":"🕷a minimal puppeteer crawler api","archived":false,"fork":false,"pushed_at":"2020-06-03T08:48:08.000Z","size":3723,"stargazers_count":5,"open_issues_count":10,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-23T16:24:39.961Z","etag":null,"topics":["crawl","crawler","crawling","puppeteer","spider","walker"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"isc","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/busterc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-08-02T19:44:44.000Z","updated_at":"2023-09-08T17:43:27.000Z","dependencies_parsed_at":"2022-08-30T13:31:33.631Z","dependency_job_id":null,"html_url":"https://github.com/busterc/crwlr","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/busterc%2Fcrwlr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/busterc%2Fcrwlr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/busterc%2Fcrwlr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/busterc%2Fcrwlr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/busterc","download_url":"https://codeload.github.com/busterc/crwlr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250468795,"owners_count":21435540,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawl","crawler","crawling","puppeteer","spider","walker"],"created_at":"2024-12-12T19:11:31.869Z","updated_at":"2025-04-23T16:24:45.798Z","avatar_url":"https://github.com/busterc.png","language":"JavaScript","readme":"# crwlr [![NPM version][npm-image]][npm-url] [![Build Status][travis-image]][travis-url] [![Dependency Status][daviddm-image]][daviddm-url] [![Coverage percentage][coveralls-image]][coveralls-url] [![Greenkeeper badge][greenkeeper-image]][greenkeeper-url]\n\n\u003e a minimal puppeteer crawler api\n\n## Huh?\n\n- crwlr:\n  - handles the boring boilerplate work of actually crawling a site\n- You provide:\n  - \u0026lt;String\u0026gt; `url` to start from\n  - \u0026lt;Puppeteer Browser\u0026gt; `browser` instance with your own `.launch(options)`\n  - `pageOptions` as you wish:\n    - \u0026lt;Object\u0026gt; `goto` to be provided as options to `page.goto(url, options)`\n    - \u0026lt;Function\u0026gt; `prepare(page)` binds event handlers and/or set properties for every new page\n    - \u0026lt;Function\u0026gt; `resolved(response, page)` fires after every `page.goto()` has resolved\n\n## Installation\n\n```sh\n$ npm install --save crwlr\n```\n\n## Usage\n\n### Basic Example - Without Any Options\n\n```js\n'use strict';\n\nconst puppeteer = require('puppeteer');\nconst crwlr = require('crwlr');\n\nconst site = 'https://buster.neocities.org/crwlr/';\n\n// *** Basic Example Without Any Options *** //\n(async () =\u003e {\n  const browser = await puppeteer.launch();\n  let crawledPages = await crwlr(browser, site);\n  console.log(crawledPages);\n})();\n/*\n[ 'https://buster.neocities.org/crwlr/',\n  'https://buster.neocities.org/crwlr/other.html',\n  'https://buster.neocities.org/crwlr/mixed-content.html',\n  'https://buster.neocities.org/crwlr/missing.html',\n  'https://buster.neocities.org/crwlr/dummy.pdf' ]\n*/\n```\n\n### Advanced Example - With Options\n\n```js\n'use strict';\n\nconst puppeteer = require('puppeteer');\nconst crwlr = require('crwlr');\n\nconst site = 'https://buster.neocities.org/crwlr/';\n\n// *** Advanced Example With Options *** //\n(async () =\u003e {\n  const browser = await puppeteer.launch({\n    headless: false\n  });\n\n  const pageOptions = {\n    prepare: page =\u003e {\n      page.on('request', request =\u003e {\n        if (request.url().match(/\\.js$/)) {\n          console.log(`${page.url()} =\u003e requested: ${request.url()}`);\n        }\n      });\n    },\n    goto: {\n      waitUntil: 'networkidle2'\n    },\n    resolved: (response, page) =\u003e {\n      console.log(`=\u003e resolved: ${response.status()} ${page.url()}`);\n    }\n  };\n\n  await crwlr(browser, site, pageOptions);\n})();\n/*\n=\u003e resolved: 200 https://buster.neocities.org/crwlr/\n=\u003e resolved: 200 https://buster.neocities.org/crwlr/other.html\nhttps://buster.neocities.org/crwlr/mixed-content.html =\u003e requested: https://mixed-script.badssl.com/nonsecure.js\n=\u003e resolved: 200 https://buster.neocities.org/crwlr/mixed-content.html\n=\u003e resolved: 404 https://buster.neocities.org/crwlr/missing.html\n=\u003e resolved: 200 https://buster.neocities.org/crwlr/dummy.pdf\n*/\n```\n\n## License\n\nISC © [Buster Collings]()\n\n[npm-image]: https://badge.fury.io/js/crwlr.svg\n[npm-url]: https://npmjs.org/package/crwlr\n[travis-image]: https://travis-ci.org/busterc/crwlr.svg?branch=master\n[travis-url]: https://travis-ci.org/busterc/crwlr\n[daviddm-image]: https://david-dm.org/busterc/crwlr.svg?theme=shields.io\n[daviddm-url]: https://david-dm.org/busterc/crwlr\n[coveralls-image]: https://coveralls.io/repos/busterc/crwlr/badge.svg\n[coveralls-url]: https://coveralls.io/r/busterc/crwlr\n[greenkeeper-image]: https://badges.greenkeeper.io/busterc/crwlr.svg\n[greenkeeper-url]: https://greenkeeper.io/\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbusterc%2Fcrwlr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbusterc%2Fcrwlr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbusterc%2Fcrwlr/lists"}