{"id":13447719,"url":"https://github.com/simplecrawler/simplecrawler","last_synced_at":"2025-12-30T09:36:53.848Z","repository":{"id":2040719,"uuid":"2977773","full_name":"simplecrawler/simplecrawler","owner":"simplecrawler","description":"Flexible event driven crawler for node.","archived":true,"fork":false,"pushed_at":"2021-03-07T07:25:38.000Z","size":1169,"stargazers_count":2137,"open_issues_count":66,"forks_count":352,"subscribers_count":78,"default_branch":"master","last_synced_at":"2025-09-22T18:59:10.757Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"uwej711/cmf-sandbox","license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simplecrawler.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-12-14T05:05:31.000Z","updated_at":"2025-09-17T06:01:20.000Z","dependencies_parsed_at":"2022-08-24T14:06:34.648Z","dependency_job_id":null,"html_url":"https://github.com/simplecrawler/simplecrawler","commit_stats":null,"previous_names":["cgiffard/node-simplecrawler"],"tags_count":21,"template":false,"template_full_name":null,"purl":"pkg:github/simplecrawler/simplecrawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simplecrawler%2Fsimplecrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simplecrawler%2Fsimplecrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simplecrawler%2Fsimplecrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simplecrawler%2Fsimplecrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simplecrawler","download_url":"https://codeload.github.com/simplecrawler/simplecrawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simplecrawler%2Fsimplecrawler/sbom","scorecard":{"id":272442,"data":{"date":"2025-08-11","repo":{"name":"github.com/simplecrawler/simplecrawler","commit":"7395d2111f1ddb62ab6478cabc540ceb0e1dc9ff"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Code-Review","score":0,"reason":"Found 1/12 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"project is archived","details":["Warn: Repository is archived."],"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENCE:0","Info: FSF or OSI recognized license: BSD 2-Clause \"Simplified\" License: LICENCE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 19 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":0,"reason":"43 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-6chw-6frg-f759","Warn: Project is vulnerable to: GHSA-v88g-cgmw-v5xw","Warn: Project is vulnerable to: GHSA-93q8-gq69-wqmw","Warn: Project is vulnerable to: GHSA-fwr7-v2mv-hh25","Warn: Project is vulnerable to: GHSA-v6h2-p8h4-qcjw","Warn: Project is vulnerable to: GHSA-grv7-fg5c-xmjg","Warn: Project is vulnerable to: GHSA-3xgq-45jj-v275","Warn: Project is vulnerable to: GHSA-gxpj-cx7g-858c","Warn: Project is vulnerable to: GHSA-2j2x-2gpw-g8fm","Warn: Project is vulnerable to: GHSA-4q6p-r6v2-jvc5","Warn: Project is vulnerable to: GHSA-ww39-953v-wcq6","Warn: Project is vulnerable to: GHSA-62gr-4qp9-h98f","Warn: Project is vulnerable to: GHSA-f52g-6jhx-586p","Warn: Project is vulnerable to: GHSA-2cf5-4w76-r9qv","Warn: Project is vulnerable to: GHSA-3cqr-58rm-57f8","Warn: Project is vulnerable to: GHSA-g9r4-xpmj-mj65","Warn: Project is vulnerable to: GHSA-q2c6-c6pm-g3gh","Warn: Project is vulnerable to: GHSA-765h-qjxv-5f44","Warn: Project is vulnerable to: GHSA-f2jv-r9rf-7988","Warn: Project is vulnerable to: GHSA-p6mc-m468-83gw","Warn: Project is vulnerable to: GHSA-29mw-wpgm-hmr9","Warn: Project is vulnerable to: GHSA-35jh-r3h4-6jhm","Warn: Project is vulnerable to: GHSA-6vfc-qv3f-vr6c","Warn: Project is vulnerable to: GHSA-5v2h-r2cx-5xgj","Warn: Project is vulnerable to: GHSA-rrrm-qjm4-v8hf","Warn: Project is vulnerable to: GHSA-f8q6-p94x-37v3","Warn: Project is vulnerable to: GHSA-vh95-rmgr-6w4m","Warn: Project is vulnerable to: GHSA-xvch-5gv4-984h","Warn: Project is vulnerable to: GHSA-g6ww-v8xp-vmwg","Warn: Project is vulnerable to: GHSA-c2qf-rxjj-qqgw","Warn: Project is vulnerable to: GHSA-mxhp-79qh-mcx6","Warn: Project is vulnerable to: GHSA-52f5-9888-hmc6","Warn: Project is vulnerable to: GHSA-cf4h-3jhx-xvhq","Warn: Project is vulnerable to: GHSA-3329-pjwv-fjpg","Warn: Project is vulnerable to: GHSA-p6j9-7xhc-rhwp","Warn: Project is vulnerable to: GHSA-89gv-h8wf-cg8r","Warn: Project is vulnerable to: GHSA-gcv8-gh4r-25x6","Warn: Project is vulnerable to: GHSA-gmv4-r438-p67f","Warn: Project is vulnerable to: GHSA-8h2f-7jc4-7m3m","Warn: Project is vulnerable to: GHSA-3vjf-82ff-p4r3","Warn: Project is vulnerable to: GHSA-g694-m8vq-gv9h","Warn: Project is vulnerable to: GHSA-c4w7-xm78-47vh","Warn: Project is vulnerable to: GHSA-p9pc-299p-vxgp"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-17T13:43:29.557Z","repository_id":2040719,"created_at":"2025-08-17T13:43:29.557Z","updated_at":"2025-08-17T13:43:29.557Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":277065566,"owners_count":25754437,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-26T02:00:09.010Z","response_time":78,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T05:01:25.174Z","updated_at":"2025-09-27T07:31:52.197Z","avatar_url":"https://github.com/simplecrawler.png","language":"JavaScript","funding_links":[],"categories":["JavaScript","Nodejs"],"sub_categories":[],"readme":"# Simple web crawler for node.js [UNMAINTAINED]\n\n**This project is unmaintained and active projects relying on it are advised to migrate to alternative solutions.**\n\n[![NPM version](https://img.shields.io/npm/v/simplecrawler.svg)](https://www.npmjs.com/package/simplecrawler)\n[![Linux Build Status](https://img.shields.io/travis/simplecrawler/simplecrawler/master.svg)](https://travis-ci.org/simplecrawler/simplecrawler)\n[![Windows Build Status](https://img.shields.io/appveyor/ci/fredrikekelund/simplecrawler.svg?label=Windows%20build)](https://ci.appveyor.com/project/fredrikekelund/simplecrawler/branch/master)\n[![Dependency Status](https://img.shields.io/david/simplecrawler/simplecrawler.svg)](https://david-dm.org/simplecrawler/simplecrawler)\n[![devDependency Status](https://img.shields.io/david/dev/simplecrawler/simplecrawler.svg)](https://david-dm.org/simplecrawler/simplecrawler?type=dev)\n[![Greenkeeper badge](https://badges.greenkeeper.io/simplecrawler/simplecrawler.svg)](https://greenkeeper.io/)\n\nsimplecrawler is designed to provide a basic, flexible and robust API for crawling websites. It was written to archive, analyse, and search some very large websites and has happily chewed through hundreds of thousands of pages and written tens of gigabytes to disk without issue.\n\n## What does simplecrawler do?\n\n* Provides a very simple event driven API using `EventEmitter`\n* Extremely configurable base for writing your own crawler\n* Provides some simple logic for auto-detecting linked resources - which you can replace or augment\n* Automatically respects any robots.txt rules\n* Has a flexible queue system which can be frozen to disk and defrosted\n* Provides basic statistics on network performance\n* Uses buffers for fetching and managing data, preserving binary data (except when discovering links)\n\n## Documentation\n\n- [Installation](#installation)\n- [Getting started](#getting-started)\n- [Events](#events)\n    - [A note about HTTP error conditions](#a-note-about-http-error-conditions)\n    - [Waiting for asynchronous event listeners](#waiting-for-asynchronous-event-listeners)\n- [Configuration](#configuration)\n- [Fetch conditions](#fetch-conditions)\n- [Download conditions](#download-conditions)\n- [The queue](#the-queue)\n    - [Manually adding to the queue](#manually-adding-to-the-queue)\n    - [Queue items](#queue-items)\n    - [Queue statistics and reporting](#queue-statistics-and-reporting)\n    - [Saving and reloading the queue (freeze/defrost)](#saving-and-reloading-the-queue-freezedefrost)\n- [Cookies](#cookies)\n    - [Cookie events](#cookie-events)\n- [Link Discovery](#link-discovery)\n- [FAQ/Troubleshooting](#faqtroubleshooting)\n- [Node Support Policy](#node-support-policy)\n- [Current Maintainers](#current-maintainers)\n- [Contributing](#contributing)\n- [Contributors](#contributors)\n- [License](#license)\n\n## Installation\n\n```sh\nnpm install --save simplecrawler\n```\n\n## Getting Started\n\nInitializing simplecrawler is a simple process. First, you require the module and instantiate it with a single argument. You then configure the properties you like (eg. the request interval), register a few event listeners, and call the start method. Let's walk through the process!\n\nAfter requiring the crawler, we create a new instance of it. We supply the constructor with a URL that indicates which domain to crawl and which resource to fetch first.\n\n```js\nvar Crawler = require(\"simplecrawler\");\n\nvar crawler = new Crawler(\"http://www.example.com/\");\n```\n\nYou can initialize the crawler with or without the `new` operator. Being able to skip it comes in handy when you want to chain API calls.\n\n```js\nvar crawler = Crawler(\"http://www.example.com/\")\n    .on(\"fetchcomplete\", function () {\n        console.log(\"Fetched a resource!\")\n    });\n```\n\nBy default, the crawler will only fetch resources on the same domain as that in the URL passed to the constructor. But this can be changed through the \u003ccode\u003e\u003ca href=\"#Crawler+domainWhitelist\"\u003ecrawler.domainWhitelist\u003c/a\u003e\u003c/code\u003e property.\n\nNow, let's configure some more things before we start crawling. Of course, you're probably wanting to ensure you don't take down your web server. Decrease the concurrency from five simultaneous requests - and increase the request interval from the default 250 ms like this:\n\n```js\ncrawler.interval = 10000; // Ten seconds\ncrawler.maxConcurrency = 3;\n```\n\nYou can also define a max depth for links to fetch:\n\n```js\ncrawler.maxDepth = 1; // Only first page is fetched (with linked CSS \u0026 images)\n// Or:\ncrawler.maxDepth = 2; // First page and discovered links from it are fetched\n// Or:\ncrawler.maxDepth = 3; // Etc.\n```\n\nFor a full list of configurable properties, see the [configuration section](#configuration).\n\nYou'll also need to set up event listeners for the [events](#events) you want to listen to. \u003ccode\u003ecrawler.fetchcomplete\u003c/code\u003e and \u003ccode\u003ecrawler.complete\u003c/code\u003e are good places to start.\n\n```js\ncrawler.on(\"fetchcomplete\", function(queueItem, responseBuffer, response) {\n    console.log(\"I just received %s (%d bytes)\", queueItem.url, responseBuffer.length);\n    console.log(\"It was a resource of type %s\", response.headers['content-type']);\n});\n```\n\nThen, when you're satisfied and ready to go, start the crawler! It'll run through its queue finding linked resources on the domain to download, until it can't find any more.\n\n```js\ncrawler.start();\n```\n\n## Events\n\nsimplecrawler's API is event driven, and there are plenty of events emitted during the different stages of the crawl.\n\n\u003ca name=\"Crawler+event_crawlstart\"\u003e\u003c/a\u003e\n\n#### \"crawlstart\"\nFired when the crawl starts. This event gives you the opportunity to\nadjust the crawler's configuration, since the crawl won't actually start\nuntil the next processor tick.\n\n\u003ca name=\"Crawler+event_discoverycomplete\"\u003e\u003c/a\u003e\n\n#### \"discoverycomplete\" (queueItem, resources)\nFired when the discovery of linked resources has completed\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item that represents the document for the discovered resources |\n| resources | \u003ccode\u003eArray\u003c/code\u003e | An array of discovered and cleaned URL's |\n\n\u003ca name=\"Crawler+event_invaliddomain\"\u003e\u003c/a\u003e\n\n#### \"invaliddomain\" (queueItem)\nFired when a resource wasn't queued because of an invalid domain name\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item representing the disallowed URL |\n\n\u003ca name=\"Crawler+event_fetchdisallowed\"\u003e\u003c/a\u003e\n\n#### \"fetchdisallowed\" (queueItem)\nFired when a resource wasn't queued because it was disallowed by the\nsite's robots.txt rules\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item representing the disallowed URL |\n\n\u003ca name=\"Crawler+event_fetchconditionerror\"\u003e\u003c/a\u003e\n\n#### \"fetchconditionerror\" (queueItem, error)\nFired when a fetch condition returns an error\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item that was processed when the error was encountered |\n| error | \u003ccode\u003e\\*\u003c/code\u003e |  |\n\n\u003ca name=\"Crawler+event_fetchprevented\"\u003e\u003c/a\u003e\n\n#### \"fetchprevented\" (queueItem, fetchCondition)\nFired when a fetch condition prevented the queueing of a URL\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item that didn't pass the fetch conditions |\n| fetchCondition | \u003ccode\u003efunction\u003c/code\u003e | The first fetch condition that returned false |\n\n\u003ca name=\"Crawler+event_queueduplicate\"\u003e\u003c/a\u003e\n\n#### \"queueduplicate\" (queueItem)\nFired when a new queue item was rejected because another\nqueue item with the same URL was already in the queue\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item that was rejected |\n\n\u003ca name=\"Crawler+event_queueerror\"\u003e\u003c/a\u003e\n\n#### \"queueerror\" (error, queueItem)\nFired when an error was encountered while updating a queue item\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| error | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The error that was returned by the queue |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item that the crawler tried to update when it encountered the error |\n\n\u003ca name=\"Crawler+event_queueadd\"\u003e\u003c/a\u003e\n\n#### \"queueadd\" (queueItem, referrer)\nFired when an item was added to the crawler's queue\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item that was added to the queue |\n| referrer | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item representing the resource where the new queue item was found |\n\n\u003ca name=\"Crawler+event_fetchtimeout\"\u003e\u003c/a\u003e\n\n#### \"fetchtimeout\" (queueItem, timeout)\nFired when a request times out\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for which the request timed out |\n| timeout | \u003ccode\u003eNumber\u003c/code\u003e | The delay in milliseconds after which the request timed out |\n\n\u003ca name=\"Crawler+event_fetchclienterror\"\u003e\u003c/a\u003e\n\n#### \"fetchclienterror\" (queueItem, error)\nFired when a request encounters an unknown error\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for which the request has errored |\n| error | \u003ccode\u003eObject\u003c/code\u003e | The error supplied to the `error` event on the request |\n\n\u003ca name=\"Crawler+event_fetchstart\"\u003e\u003c/a\u003e\n\n#### \"fetchstart\" (queueItem, requestOptions)\nFired just after a request has been initiated\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for which the request has been initiated |\n| requestOptions | \u003ccode\u003eObject\u003c/code\u003e | The options generated for the HTTP request |\n\n\u003ca name=\"Crawler+event_cookieerror\"\u003e\u003c/a\u003e\n\n#### \"cookieerror\" (queueItem, error, cookie)\nFired when an error was encountered while trying to add a\ncookie to the cookie jar\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item representing the resource that returned the cookie |\n| error | \u003ccode\u003eError\u003c/code\u003e | The error that was encountered |\n| cookie | \u003ccode\u003eString\u003c/code\u003e | The Set-Cookie header value that was returned from the request |\n\n\u003ca name=\"Crawler+event_fetchheaders\"\u003e\u003c/a\u003e\n\n#### \"fetchheaders\" (queueItem, response)\nFired when the headers for a request have been received\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for which the headers have been received |\n| response | \u003ccode\u003ehttp.IncomingMessage\u003c/code\u003e | The [http.IncomingMessage](https://nodejs.org/api/http.html#http_class_http_incomingmessage) for the request's response |\n\n\u003ca name=\"Crawler+event_downloadconditionerror\"\u003e\u003c/a\u003e\n\n#### \"downloadconditionerror\" (queueItem, error)\nFired when a download condition returns an error\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item that was processed when the error was encountered |\n| error | \u003ccode\u003e\\*\u003c/code\u003e |  |\n\n\u003ca name=\"Crawler+event_downloadprevented\"\u003e\u003c/a\u003e\n\n#### \"downloadprevented\" (queueItem, response)\nFired when the downloading of a resource was prevented\nby a download condition\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item representing the resource that was halfway fetched |\n| response | \u003ccode\u003ehttp.IncomingMessage\u003c/code\u003e | The [http.IncomingMessage](https://nodejs.org/api/http.html#http_class_http_incomingmessage) for the request's response |\n\n\u003ca name=\"Crawler+event_notmodified\"\u003e\u003c/a\u003e\n\n#### \"notmodified\" (queueItem, response, cacheObject)\nFired when the crawler's cache was enabled and the server responded with a 304 Not Modified status for the request\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for which the request returned a 304 status |\n| response | \u003ccode\u003ehttp.IncomingMessage\u003c/code\u003e | The [http.IncomingMessage](https://nodejs.org/api/http.html#http_class_http_incomingmessage) for the request's response |\n| cacheObject | \u003ccode\u003eCacheObject\u003c/code\u003e | The CacheObject returned from the cache backend |\n\n\u003ca name=\"Crawler+event_fetchredirect\"\u003e\u003c/a\u003e\n\n#### \"fetchredirect\" (queueItem, redirectQueueItem, response)\nFired when the server returned a redirect HTTP status for the request\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for which the request was redirected |\n| redirectQueueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for the redirect target resource |\n| response | \u003ccode\u003ehttp.IncomingMessage\u003c/code\u003e | The [http.IncomingMessage](https://nodejs.org/api/http.html#http_class_http_incomingmessage) for the request's response |\n\n\u003ca name=\"Crawler+event_fetch404\"\u003e\u003c/a\u003e\n\n#### \"fetch404\" (queueItem, response)\nFired when the server returned a 404 Not Found status for the request\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for which the request returned a 404 status |\n| response | \u003ccode\u003ehttp.IncomingMessage\u003c/code\u003e | The [http.IncomingMessage](https://nodejs.org/api/http.html#http_class_http_incomingmessage) for the request's response |\n\n\u003ca name=\"Crawler+event_fetch410\"\u003e\u003c/a\u003e\n\n#### \"fetch410\" (queueItem, response)\nFired when the server returned a 410 Gone status for the request\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for which the request returned a 410 status |\n| response | \u003ccode\u003ehttp.IncomingMessage\u003c/code\u003e | The [http.IncomingMessage](https://nodejs.org/api/http.html#http_class_http_incomingmessage) for the request's response |\n\n\u003ca name=\"Crawler+event_fetcherror\"\u003e\u003c/a\u003e\n\n#### \"fetcherror\" (queueItem, response)\nFired when the server returned a status code above 400 that isn't 404 or 410\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for which the request failed |\n| response | \u003ccode\u003ehttp.IncomingMessage\u003c/code\u003e | The [http.IncomingMessage](https://nodejs.org/api/http.html#http_class_http_incomingmessage) for the request's response |\n\n\u003ca name=\"Crawler+event_fetchcomplete\"\u003e\u003c/a\u003e\n\n#### \"fetchcomplete\" (queueItem, responseBody, response)\nFired when the request has completed\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for which the request has completed |\n| responseBody | \u003ccode\u003eString\u003c/code\u003e \\| \u003ccode\u003eBuffer\u003c/code\u003e | If [decodeResponses](#Crawler+decodeResponses) is true, this will be the decoded HTTP response. Otherwise it will be the raw response buffer. |\n| response | \u003ccode\u003ehttp.IncomingMessage\u003c/code\u003e | The [http.IncomingMessage](https://nodejs.org/api/http.html#http_class_http_incomingmessage) for the request's response |\n\n\u003ca name=\"Crawler+event_gziperror\"\u003e\u003c/a\u003e\n\n#### \"gziperror\" (queueItem, responseBody, response)\nFired when an error was encountered while unzipping the response data\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for which the unzipping failed |\n| responseBody | \u003ccode\u003eString\u003c/code\u003e \\| \u003ccode\u003eBuffer\u003c/code\u003e | If [decodeResponses](#Crawler+decodeResponses) is true, this will be the decoded HTTP response. Otherwise it will be the raw response buffer. |\n| response | \u003ccode\u003ehttp.IncomingMessage\u003c/code\u003e | The [http.IncomingMessage](https://nodejs.org/api/http.html#http_class_http_incomingmessage) for the request's response |\n\n\u003ca name=\"Crawler+event_fetchdataerror\"\u003e\u003c/a\u003e\n\n#### \"fetchdataerror\" (queueItem, response)\nFired when a resource couldn't be downloaded because it exceeded the maximum allowed size\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item for which the request failed |\n| response | \u003ccode\u003ehttp.IncomingMessage\u003c/code\u003e | The [http.IncomingMessage](https://nodejs.org/api/http.html#http_class_http_incomingmessage) for the request's response |\n\n\u003ca name=\"Crawler+event_robotstxterror\"\u003e\u003c/a\u003e\n\n#### \"robotstxterror\" (error)\nFired when an error was encountered while retrieving a robots.txt file\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| error | \u003ccode\u003eError\u003c/code\u003e | The error returned from [getRobotsTxt](#Crawler+getRobotsTxt) |\n\n\u003ca name=\"Crawler+event_complete\"\u003e\u003c/a\u003e\n\n#### \"complete\"\nFired when the crawl has completed - all resources in the queue have been dealt with\n\n\n### A note about HTTP error conditions\n\nBy default, simplecrawler does not download the response body when it encounters an HTTP error status in the response. If you need this information, you can listen to simplecrawler's error events, and through node's native `data` event (`response.on(\"data\",function(chunk) {...})`) you can save the information yourself.\n\n### Waiting for asynchronous event listeners\n\nSometimes, you might want to wait for simplecrawler to wait for you while you perform some asynchronous tasks in an event listener, instead of having it racing off and firing the `complete` event, halting your crawl. For example, if you're doing your own link discovery using an asynchronous library method.\n\nsimplecrawler provides a `wait` method you can call at any time. It is available via `this` from inside listeners, and on the crawler object itself. It returns a callback function.\n\nOnce you've called this method, simplecrawler will not fire the `complete` event until either you execute the callback it returns, or a timeout is reached (configured in `crawler.listenerTTL`, by default 10000 ms.)\n\n#### Example asynchronous event listener\n\n```js\ncrawler.on(\"fetchcomplete\", function(queueItem, data, res) {\n    var continue = this.wait();\n\n    doSomeDiscovery(data, function(foundURLs) {\n        foundURLs.forEach(function(url) {\n            crawler.queueURL(url, queueItem);\n        });\n\n        continue();\n    });\n});\n```\n\n## Configuration\n\nsimplecrawler is highly configurable and there's a long list of settings you can change to adapt it to your specific needs.\n\n\u003ca name=\"Crawler+initialURL\"\u003e\u003c/a\u003e\n\n#### crawler.initialURL : \u003ccode\u003eString\u003c/code\u003e\nControls which URL to request first\n\n\u003ca name=\"Crawler+host\"\u003e\u003c/a\u003e\n\n#### crawler.host : \u003ccode\u003eString\u003c/code\u003e\nDetermines what hostname the crawler should limit requests to (so long as\n[filterByDomain](#Crawler+filterByDomain) is true)\n\n\u003ca name=\"Crawler+interval\"\u003e\u003c/a\u003e\n\n#### crawler.interval : \u003ccode\u003eNumber\u003c/code\u003e\nDetermines the interval at which new requests are spawned by the crawler,\nas long as the number of open requests is under the\n[maxConcurrency](#Crawler+maxConcurrency) cap.\n\n\u003ca name=\"Crawler+maxConcurrency\"\u003e\u003c/a\u003e\n\n#### crawler.maxConcurrency : \u003ccode\u003eNumber\u003c/code\u003e\nMaximum request concurrency. If necessary, simplecrawler will increase\nnode's http agent maxSockets value to match this setting.\n\n\u003ca name=\"Crawler+timeout\"\u003e\u003c/a\u003e\n\n#### crawler.timeout : \u003ccode\u003eNumber\u003c/code\u003e\nMaximum time we'll wait for headers\n\n\u003ca name=\"Crawler+listenerTTL\"\u003e\u003c/a\u003e\n\n#### crawler.listenerTTL : \u003ccode\u003eNumber\u003c/code\u003e\nMaximum time we'll wait for async listeners\n\n\u003ca name=\"Crawler+userAgent\"\u003e\u003c/a\u003e\n\n#### crawler.userAgent : \u003ccode\u003eString\u003c/code\u003e\nCrawler's user agent string\n\n**Default**: \u003ccode\u003e\u0026quot;Node/simplecrawler \u0026lt;version\u0026gt; (https://github.com/simplecrawler/simplecrawler)\u0026quot;\u003c/code\u003e  \n\u003ca name=\"Crawler+queue\"\u003e\u003c/a\u003e\n\n#### crawler.queue : [\u003ccode\u003eFetchQueue\u003c/code\u003e](#FetchQueue)\nQueue for requests. The crawler can use any implementation so long as it\nuses the same interface. The default queue is simply backed by an array.\n\n\u003ca name=\"Crawler+respectRobotsTxt\"\u003e\u003c/a\u003e\n\n#### crawler.respectRobotsTxt : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether the crawler respects the robots.txt rules of any domain.\nThis is done both with regards to the robots.txt file, and `\u003cmeta\u003e` tags\nthat specify a `nofollow` value for robots. The latter only applies if\nthe default [discoverResources](#Crawler+discoverResources) method is used, though.\n\n\u003ca name=\"Crawler+allowInitialDomainChange\"\u003e\u003c/a\u003e\n\n#### crawler.allowInitialDomainChange : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether the crawler is allowed to change the\n[host](#Crawler+host) setting if the first response is a redirect to\nanother domain.\n\n\u003ca name=\"Crawler+decompressResponses\"\u003e\u003c/a\u003e\n\n#### crawler.decompressResponses : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether HTTP responses are automatically decompressed based on\ntheir Content-Encoding header. If true, it will also assign the\nappropriate Accept-Encoding header to requests.\n\n\u003ca name=\"Crawler+decodeResponses\"\u003e\u003c/a\u003e\n\n#### crawler.decodeResponses : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether HTTP responses are automatically character converted to\nstandard JavaScript strings using the [iconv-lite](https://www.npmjs.com/package/iconv-lite)\nmodule before emitted in the [fetchcomplete](#Crawler+event_fetchcomplete) event.\nThe character encoding is interpreted from the Content-Type header\nfirstly, and secondly from any `\u003cmeta charset=\"xxx\" /\u003e` tags.\n\n\u003ca name=\"Crawler+filterByDomain\"\u003e\u003c/a\u003e\n\n#### crawler.filterByDomain : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether the crawler fetches only URL's where the hostname\nmatches [host](#Crawler+host). Unless you want to be crawling the entire\ninternet, I would recommend leaving this on!\n\n\u003ca name=\"Crawler+scanSubdomains\"\u003e\u003c/a\u003e\n\n#### crawler.scanSubdomains : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether URL's that points to a subdomain of [host](#Crawler+host)\nshould also be fetched.\n\n\u003ca name=\"Crawler+ignoreWWWDomain\"\u003e\u003c/a\u003e\n\n#### crawler.ignoreWWWDomain : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether to treat the www subdomain as the same domain as\n[host](#Crawler+host). So if [http://example.com/example](http://example.com/example) has\nalready been fetched, [http://www.example.com/example](http://www.example.com/example) won't be\nfetched also.\n\n\u003ca name=\"Crawler+stripWWWDomain\"\u003e\u003c/a\u003e\n\n#### crawler.stripWWWDomain : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether to strip the www subdomain entirely from URL's at queue\nitem construction time.\n\n\u003ca name=\"Crawler+cache\"\u003e\u003c/a\u003e\n\n#### crawler.cache : \u003ccode\u003eSimpleCache\u003c/code\u003e\nInternal cache store. Must implement `SimpleCache` interface. You can\nsave the site to disk using the built in file system cache like this:\n\n```js\ncrawler.cache = new Crawler.cache('pathToCacheDirectory');\n```\n\n\u003ca name=\"Crawler+useProxy\"\u003e\u003c/a\u003e\n\n#### crawler.useProxy : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether an HTTP proxy should be used for requests\n\n\u003ca name=\"Crawler+proxyHostname\"\u003e\u003c/a\u003e\n\n#### crawler.proxyHostname : \u003ccode\u003eString\u003c/code\u003e\nIf [useProxy](#Crawler+useProxy) is true, this setting controls what hostname\nto use for the proxy\n\n\u003ca name=\"Crawler+proxyPort\"\u003e\u003c/a\u003e\n\n#### crawler.proxyPort : \u003ccode\u003eNumber\u003c/code\u003e\nIf [useProxy](#Crawler+useProxy) is true, this setting controls what port to\nuse for the proxy\n\n\u003ca name=\"Crawler+proxyUser\"\u003e\u003c/a\u003e\n\n#### crawler.proxyUser : \u003ccode\u003eString\u003c/code\u003e\nIf [useProxy](#Crawler+useProxy) is true, this setting controls what username\nto use for the proxy\n\n\u003ca name=\"Crawler+proxyPass\"\u003e\u003c/a\u003e\n\n#### crawler.proxyPass : \u003ccode\u003eString\u003c/code\u003e\nIf [useProxy](#Crawler+useProxy) is true, this setting controls what password\nto use for the proxy\n\n\u003ca name=\"Crawler+needsAuth\"\u003e\u003c/a\u003e\n\n#### crawler.needsAuth : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether to use HTTP Basic Auth\n\n\u003ca name=\"Crawler+authUser\"\u003e\u003c/a\u003e\n\n#### crawler.authUser : \u003ccode\u003eString\u003c/code\u003e\nIf [needsAuth](#Crawler+needsAuth) is true, this setting controls what username\nto send with HTTP Basic Auth\n\n\u003ca name=\"Crawler+authPass\"\u003e\u003c/a\u003e\n\n#### crawler.authPass : \u003ccode\u003eString\u003c/code\u003e\nIf [needsAuth](#Crawler+needsAuth) is true, this setting controls what password\nto send with HTTP Basic Auth\n\n\u003ca name=\"Crawler+acceptCookies\"\u003e\u003c/a\u003e\n\n#### crawler.acceptCookies : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether to save and send cookies or not\n\n\u003ca name=\"Crawler+cookies\"\u003e\u003c/a\u003e\n\n#### crawler.cookies : [\u003ccode\u003eCookieJar\u003c/code\u003e](#CookieJar)\nThe module used to store cookies\n\n\u003ca name=\"Crawler+customHeaders\"\u003e\u003c/a\u003e\n\n#### crawler.customHeaders : \u003ccode\u003eObject\u003c/code\u003e\nControls what headers (besides the default ones) to include with every\nrequest.\n\n\u003ca name=\"Crawler+domainWhitelist\"\u003e\u003c/a\u003e\n\n#### crawler.domainWhitelist : \u003ccode\u003eArray\u003c/code\u003e\nControls what domains the crawler is allowed to fetch from, regardless of\n[host](#Crawler+host) or [filterByDomain](#Crawler+filterByDomain) settings.\n\n\u003ca name=\"Crawler+allowedProtocols\"\u003e\u003c/a\u003e\n\n#### crawler.allowedProtocols : \u003ccode\u003eArray.\u0026lt;RegExp\u0026gt;\u003c/code\u003e\nControls what protocols the crawler is allowed to fetch from\n\n\u003ca name=\"Crawler+maxResourceSize\"\u003e\u003c/a\u003e\n\n#### crawler.maxResourceSize : \u003ccode\u003eNumber\u003c/code\u003e\nControls the maximum allowed size in bytes of resources to be fetched\n\n**Default**: \u003ccode\u003e16777216\u003c/code\u003e  \n\u003ca name=\"Crawler+supportedMimeTypes\"\u003e\u003c/a\u003e\n\n#### crawler.supportedMimeTypes : \u003ccode\u003eArray.\u0026lt;(RegExp\\|string)\u0026gt;\u003c/code\u003e\nControls what mimetypes the crawler will scan for new resources. If\n[downloadUnsupported](#Crawler+downloadUnsupported) is false, this setting will also\nrestrict what resources are downloaded.\n\n\u003ca name=\"Crawler+downloadUnsupported\"\u003e\u003c/a\u003e\n\n#### crawler.downloadUnsupported : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether to download resources with unsupported mimetypes (as\nspecified by [supportedMimeTypes](#Crawler+supportedMimeTypes))\n\n\u003ca name=\"Crawler+urlEncoding\"\u003e\u003c/a\u003e\n\n#### crawler.urlEncoding : \u003ccode\u003eString\u003c/code\u003e\nControls what URL encoding to use. Can be either \"unicode\" or \"iso8859\"\n\n\u003ca name=\"Crawler+stripQuerystring\"\u003e\u003c/a\u003e\n\n#### crawler.stripQuerystring : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether to strip query string parameters from URL's at queue\nitem construction time.\n\n\u003ca name=\"Crawler+sortQueryParameters\"\u003e\u003c/a\u003e\n\n#### crawler.sortQueryParameters : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether to sort query string parameters from URL's at queue\nitem construction time.\n\n\u003ca name=\"Crawler+discoverRegex\"\u003e\u003c/a\u003e\n\n#### crawler.discoverRegex : \u003ccode\u003eArray.\u0026lt;(RegExp\\|function())\u0026gt;\u003c/code\u003e\nCollection of regular expressions and functions that are applied in the\ndefault [discoverResources](#Crawler+discoverResources) method.\n\n\u003ca name=\"Crawler+parseHTMLComments\"\u003e\u003c/a\u003e\n\n#### crawler.parseHTMLComments : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether the default [discoverResources](#Crawler+discoverResources) should\nscan for new resources inside of HTML comments.\n\n\u003ca name=\"Crawler+parseScriptTags\"\u003e\u003c/a\u003e\n\n#### crawler.parseScriptTags : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether the default [discoverResources](#Crawler+discoverResources) should\nscan for new resources inside of `\u003cscript\u003e` tags.\n\n\u003ca name=\"Crawler+maxDepth\"\u003e\u003c/a\u003e\n\n#### crawler.maxDepth : \u003ccode\u003eNumber\u003c/code\u003e\nControls the max depth of resources that the crawler fetches. 0 means\nthat the crawler won't restrict requests based on depth. The initial\nresource, as well as manually queued resources, are at depth 1. From\nthere, every discovered resource adds 1 to its referrer's depth.\n\n\u003ca name=\"Crawler+ignoreInvalidSSL\"\u003e\u003c/a\u003e\n\n#### crawler.ignoreInvalidSSL : \u003ccode\u003eBoolean\u003c/code\u003e\nControls whether to proceed anyway when the crawler encounters an invalid\nSSL certificate.\n\n\u003ca name=\"Crawler+httpAgent\"\u003e\u003c/a\u003e\n\n#### crawler.httpAgent : \u003ccode\u003eHTTPAgent\u003c/code\u003e\nControls what HTTP agent to use. This is useful if you want to configure\neg. a SOCKS client.\n\n\u003ca name=\"Crawler+httpsAgent\"\u003e\u003c/a\u003e\n\n#### crawler.httpsAgent : \u003ccode\u003eHTTPAgent\u003c/code\u003e\nControls what HTTPS agent to use. This is useful if you want to configure\neg. a SOCKS client.\n\n\n## Fetch conditions\n\nsimplecrawler has an concept called fetch conditions that offers a flexible API for filtering discovered resources before they're put in the queue. A fetch condition is a function that takes a queue item candidate and evaluates (synchronously or asynchronously) whether it should be added to the queue or not. *Please note: with the next major release, all fetch conditions will be asynchronous.*\n\nYou may add as many fetch conditions as you like, and remove them at runtime. simplecrawler will evaluate every fetch condition in parallel until one is encountered that returns a falsy value. If that happens, the resource in question will not be fetched.\n\nThis API is complemented by [download conditions](#download-conditions) that determine whether a resource's body data should be downloaded.\n\n\u003ca name=\"Crawler+addFetchCondition\"\u003e\u003c/a\u003e\n\n#### crawler.addFetchCondition(callback) ⇒ \u003ccode\u003eNumber\u003c/code\u003e\nAdds a callback to the fetch conditions array. simplecrawler will evaluate\nall fetch conditions for every discovered URL, and if any of the fetch\nconditions returns a falsy value, the URL won't be queued.\n\n**Returns**: \u003ccode\u003eNumber\u003c/code\u003e - The index of the fetch condition in the fetch conditions array. This can later be used to remove the fetch condition.  \n\n| Param | Type | Description |\n| --- | --- | --- |\n| callback | [\u003ccode\u003eaddFetchConditionCallback\u003c/code\u003e](#Crawler..addFetchConditionCallback) | Function to be called after resource discovery that's able to prevent queueing of resource |\n\n\u003ca name=\"Crawler..addFetchConditionCallback\"\u003e\u003c/a\u003e\n\n#### Crawler~addFetchConditionCallback : \u003ccode\u003efunction\u003c/code\u003e\nEvaluated for every discovered URL to determine whether to put it in the\nqueue.\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The resource to be queued (or not) |\n| referrerQueueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The resource where `queueItem` was discovered |\n| callback | \u003ccode\u003efunction\u003c/code\u003e |  |\n\n\u003ca name=\"Crawler+removeFetchCondition\"\u003e\u003c/a\u003e\n\n#### crawler.removeFetchCondition(id) ⇒ \u003ccode\u003eBoolean\u003c/code\u003e\nRemoves a fetch condition from the fetch conditions array.\n\n**Returns**: \u003ccode\u003eBoolean\u003c/code\u003e - If the removal was successful, the method will return true. Otherwise, it will throw an error.  \n\n| Param | Type | Description |\n| --- | --- | --- |\n| id | \u003ccode\u003eNumber\u003c/code\u003e \\| \u003ccode\u003efunction\u003c/code\u003e | The numeric ID of the fetch condition, or a reference to the fetch condition itself. This was returned from [addFetchCondition](#Crawler+addFetchCondition) |\n\n\n## Download conditions\n\nWhile fetch conditions let you determine which resources to put in the queue, download conditions offer the same kind of flexible API for determining which resources' data to download. Download conditions support both a synchronous and an asynchronous API, but *with the next major release, all download conditions will be asynchronous.*\n\nDownload conditions are evaluated after the headers of a resource have been downloaded, if that resource returned an HTTP status between 200 and 299. This lets you inspect the content-type and content-length headers, along with all other properties on the queue item, before deciding if you want this resource's data or not.\n\n\u003ca name=\"Crawler+addDownloadCondition\"\u003e\u003c/a\u003e\n\n#### crawler.addDownloadCondition(callback) ⇒ \u003ccode\u003eNumber\u003c/code\u003e\nAdds a callback to the download conditions array. simplecrawler will evaluate\nall download conditions for every fetched resource after the headers of that\nresource have been received. If any of the download conditions returns a\nfalsy value, the resource data won't be downloaded.\n\n**Returns**: \u003ccode\u003eNumber\u003c/code\u003e - The index of the download condition in the download conditions array. This can later be used to remove the download condition.  \n\n| Param | Type | Description |\n| --- | --- | --- |\n| callback | [\u003ccode\u003eaddDownloadConditionCallback\u003c/code\u003e](#Crawler..addDownloadConditionCallback) | Function to be called when the headers of the resource represented by the queue item have been downloaded |\n\n\u003ca name=\"Crawler..addDownloadConditionCallback\"\u003e\u003c/a\u003e\n\n#### Crawler~addDownloadConditionCallback : \u003ccode\u003efunction\u003c/code\u003e\nEvaluated for every fetched resource after its header have been received to\ndetermine whether to fetch the resource body.\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| queueItem | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The resource to be downloaded (or not) |\n| response | \u003ccode\u003ehttp.IncomingMessage\u003c/code\u003e | The response object as returned by node's `http` API |\n| callback | \u003ccode\u003efunction\u003c/code\u003e |  |\n\n\u003ca name=\"Crawler+removeDownloadCondition\"\u003e\u003c/a\u003e\n\n#### crawler.removeDownloadCondition(id) ⇒ \u003ccode\u003eBoolean\u003c/code\u003e\nRemoves a download condition from the download conditions array.\n\n**Returns**: \u003ccode\u003eBoolean\u003c/code\u003e - If the removal was successful, the method will return true. Otherwise, it will throw an error.  \n\n| Param | Type | Description |\n| --- | --- | --- |\n| id | \u003ccode\u003eNumber\u003c/code\u003e \\| \u003ccode\u003efunction\u003c/code\u003e | The numeric ID of the download condition, or a reference to the download condition itself. The ID was returned from [addDownloadCondition](#Crawler+addDownloadCondition) |\n\n\n## The queue\n\nLike any other web crawler, simplecrawler has a queue. It can be directly accessed through \u003ccode\u003e\u003ca href=\"#Crawler+queue\"\u003ecrawler.queue\u003c/a\u003e\u003c/code\u003e and implements an asynchronous interface for accessing queue items and statistics. There are several methods for interacting with the queue, the simplest being \u003ccode\u003e\u003ca href=\"#FetchQueue+get\"\u003ecrawler.queue.get\u003c/a\u003e\u003c/code\u003e, which lets you get a queue item at a specific index in the queue.\n\n\u003ca name=\"FetchQueue+get\"\u003e\u003c/a\u003e\n\n#### fetchQueue.get(index, callback)\nGet a queue item by index\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| index | \u003ccode\u003eNumber\u003c/code\u003e | The index of the queue item in the queue |\n| callback | \u003ccode\u003efunction\u003c/code\u003e | Gets two parameters, `error` and `queueItem`. If the operation was successful, `error` will be `null`. |\n\n\n*All queue method are in reality synchronous by default, but simplecrawler is built to be able to use different queues that implement the same interface, and those implementations can be asynchronous - which means they could eg. be backed by a database.*\n\n### Manually adding to the queue\n\nTo add items to the queue, use \u003ccode\u003e\u003ca href=\"#Crawler+queueURL\"\u003ecrawler.queueURL\u003c/a\u003e\u003c/code\u003e.\n\n\u003ca name=\"Crawler+queueURL\"\u003e\u003c/a\u003e\n\n#### crawler.queueURL(url, [referrer], [force]) ⇒ \u003ccode\u003eBoolean\u003c/code\u003e\nQueues a URL for fetching after cleaning, validating and constructing a queue\nitem from it. If you're queueing a URL manually, use this method rather than\n[Crawler#queue#add](Crawler#queue#add)\n\n**Returns**: \u003ccode\u003eBoolean\u003c/code\u003e - The return value used to indicate whether the URL passed all fetch conditions and robots.txt rules. With the advent of async fetch conditions, the return value will no longer take fetch conditions into account.  \n**Emits**: [\u003ccode\u003einvaliddomain\u003c/code\u003e](#Crawler+event_invaliddomain), [\u003ccode\u003efetchdisallowed\u003c/code\u003e](#Crawler+event_fetchdisallowed), [\u003ccode\u003efetchconditionerror\u003c/code\u003e](#Crawler+event_fetchconditionerror), [\u003ccode\u003efetchprevented\u003c/code\u003e](#Crawler+event_fetchprevented), [\u003ccode\u003equeueduplicate\u003c/code\u003e](#Crawler+event_queueduplicate), [\u003ccode\u003equeueerror\u003c/code\u003e](#Crawler+event_queueerror), [\u003ccode\u003equeueadd\u003c/code\u003e](#Crawler+event_queueadd)  \n\n| Param | Type | Description |\n| --- | --- | --- |\n| url | \u003ccode\u003eString\u003c/code\u003e | An absolute or relative URL. If relative, [processURL](#Crawler+processURL) will make it absolute to the referrer queue item. |\n| [referrer] | [\u003ccode\u003eQueueItem\u003c/code\u003e](#QueueItem) | The queue item representing the resource where this URL was discovered. |\n| [force] | \u003ccode\u003eBoolean\u003c/code\u003e | If true, the URL will be queued regardless of whether it already exists in the queue or not. |\n\n\n### Queue items\n\nBecause when working with simplecrawler, you'll constantly be handed queue items, it helps to know what's inside them. Here's the formal documentation of the properties that they contain.\n\n\u003ca name=\"QueueItem\"\u003e\u003c/a\u003e\n\n#### QueueItem : \u003ccode\u003eObject\u003c/code\u003e\nQueueItems represent resources in the queue that have been fetched, or will be eventually.\n\n**Properties**\n\n| Name | Type | Description |\n| --- | --- | --- |\n| id | \u003ccode\u003eNumber\u003c/code\u003e | A unique ID assigned by the queue when the queue item is added |\n| url | \u003ccode\u003eString\u003c/code\u003e | The complete, canonical URL of the resource |\n| protocol | \u003ccode\u003eString\u003c/code\u003e | The protocol of the resource (http, https) |\n| host | \u003ccode\u003eString\u003c/code\u003e | The full domain/hostname of the resource |\n| port | \u003ccode\u003eNumber\u003c/code\u003e | The port of the resource |\n| path | \u003ccode\u003eString\u003c/code\u003e | The URL path, including the query string |\n| uriPath | \u003ccode\u003eString\u003c/code\u003e | The URL path, excluding the query string |\n| depth | \u003ccode\u003eNumber\u003c/code\u003e | How many steps simplecrawler has taken from the initial page (which is depth 1) to this resource. |\n| referrer | \u003ccode\u003eString\u003c/code\u003e | The URL of the resource where the URL of this queue item was discovered |\n| fetched | \u003ccode\u003eBoolean\u003c/code\u003e | Has the request for this item been completed? You can monitor this as requests are processed. |\n| status | \u003ccode\u003e\u0026#x27;queued\u0026#x27;\u003c/code\u003e \\| \u003ccode\u003e\u0026#x27;spooled\u0026#x27;\u003c/code\u003e \\| \u003ccode\u003e\u0026#x27;headers\u0026#x27;\u003c/code\u003e \\| \u003ccode\u003e\u0026#x27;downloaded\u0026#x27;\u003c/code\u003e \\| \u003ccode\u003e\u0026#x27;redirected\u0026#x27;\u003c/code\u003e \\| \u003ccode\u003e\u0026#x27;notfound\u0026#x27;\u003c/code\u003e \\| \u003ccode\u003e\u0026#x27;failed\u0026#x27;\u003c/code\u003e | The internal status of the item. |\n| stateData | \u003ccode\u003eObject\u003c/code\u003e | An object containing state data and other information about the request. |\n| stateData.requestLatency | \u003ccode\u003eNumber\u003c/code\u003e | The time (in ms) taken for headers to be received after the request was made. |\n| stateData.requestTime | \u003ccode\u003eNumber\u003c/code\u003e | The total time (in ms) taken for the request (including download time.) |\n| stateData.downloadTime | \u003ccode\u003eNumber\u003c/code\u003e | The total time (in ms) taken for the resource to be downloaded. |\n| stateData.contentLength | \u003ccode\u003eNumber\u003c/code\u003e | The length (in bytes) of the returned content. Calculated based on the `content-length` header. |\n| stateData.contentType | \u003ccode\u003eString\u003c/code\u003e | The MIME type of the content. |\n| stateData.code | \u003ccode\u003eNumber\u003c/code\u003e | The HTTP status code returned for the request. Note that this code is `600` if an error occurred in the client and a fetch operation could not take place successfully. |\n| stateData.headers | \u003ccode\u003eObject\u003c/code\u003e | An object containing the header information returned by the server. This is the object node returns as part of the `response` object. |\n| stateData.actualDataSize | \u003ccode\u003eNumber\u003c/code\u003e | The length (in bytes) of the returned content. Calculated based on what is actually received, not the `content-length` header. |\n| stateData.sentIncorrectSize | \u003ccode\u003eBoolean\u003c/code\u003e | True if the data length returned by the server did not match what we were told to expect by the `content-length` header. |\n\n\n### Queue statistics and reporting\n\nFirst of all, the queue can provide some basic statistics about the network performance of your crawl so far. This is done live, so don't check it 30 times a second. You can test the following properties:\n\n* `requestTime`\n* `requestLatency`\n* `downloadTime`\n* `contentLength`\n* `actualDataSize`\n\nYou can get the maximum, minimum, and average values for each with the \u003ccode\u003e\u003ca href=\"#FetchQueue+max\"\u003ecrawler.queue.max\u003c/a\u003e\u003c/code\u003e, \u003ccode\u003e\u003ca href=\"#FetchQueue+min\"\u003ecrawler.queue.min\u003c/a\u003e\u003c/code\u003e, and \u003ccode\u003e\u003ca href=\"#FetchQueue+avg\"\u003ecrawler.queue.avg\u003c/a\u003e\u003c/code\u003e functions respectively.\n\n\u003ca name=\"FetchQueue+max\"\u003e\u003c/a\u003e\n\n#### fetchQueue.max(statisticName, callback)\nGets the maximum value of a stateData property from all the items in the\nqueue. This means you can eg. get the maximum request time, download size\netc.\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| statisticName | \u003ccode\u003eString\u003c/code\u003e | Can be any of the strings in [_allowedStatistics](#FetchQueue._allowedStatistics) |\n| callback | \u003ccode\u003efunction\u003c/code\u003e | Gets two parameters, `error` and `max`. If the operation was successful, `error` will be `null`. |\n\n\u003ca name=\"FetchQueue+min\"\u003e\u003c/a\u003e\n\n#### fetchQueue.min(statisticName, callback)\nGets the minimum value of a stateData property from all the items in the\nqueue. This means you can eg. get the minimum request time, download size\netc.\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| statisticName | \u003ccode\u003eString\u003c/code\u003e | Can be any of the strings in [_allowedStatistics](#FetchQueue._allowedStatistics) |\n| callback | \u003ccode\u003efunction\u003c/code\u003e | Gets two parameters, `error` and `min`. If the operation was successful, `error` will be `null`. |\n\n\u003ca name=\"FetchQueue+avg\"\u003e\u003c/a\u003e\n\n#### fetchQueue.avg(statisticName, callback)\nGets the average value of a stateData property from all the items in the\nqueue. This means you can eg. get the average request time, download size\netc.\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| statisticName | \u003ccode\u003eString\u003c/code\u003e | Can be any of the strings in [_allowedStatistics](#FetchQueue._allowedStatistics) |\n| callback | \u003ccode\u003efunction\u003c/code\u003e | Gets two parameters, `error` and `avg`. If the operation was successful, `error` will be `null`. |\n\n\nFor general filtering or counting of queue items, there are two methods: \u003ccode\u003e\u003ca href=\"#FetchQueue+filterItems\"\u003ecrawler.queue.filterItems\u003c/a\u003e\u003c/code\u003e and \u003ccode\u003e\u003ca href=\"#FetchQueue+countItems\"\u003ecrawler.queue.countItems\u003c/a\u003e\u003c/code\u003e. Both take an object comparator and a callback.\n\n\u003ca name=\"FetchQueue+filterItems\"\u003e\u003c/a\u003e\n\n#### fetchQueue.filterItems(comparator, callback)\nFilters and returns the items in the queue that match a selector\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| comparator | \u003ccode\u003eObject\u003c/code\u003e | Comparator object used to filter items. Queue items that are returned need to match all the properties of this object. |\n| callback | \u003ccode\u003efunction\u003c/code\u003e | Gets two parameters, `error` and `items`. If the operation was successful, `error` will be `null` and `items` will be an array of QueueItems. |\n\n\u003ca name=\"FetchQueue+countItems\"\u003e\u003c/a\u003e\n\n#### fetchQueue.countItems(comparator, callback, callback)\nCounts the items in the queue that match a selector\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| comparator | \u003ccode\u003eObject\u003c/code\u003e | Comparator object used to filter items. Queue items that are counted need to match all the properties of this object. |\n| callback | \u003ccode\u003eFetchQueue~countItemsCallback\u003c/code\u003e |  |\n| callback | \u003ccode\u003efunction\u003c/code\u003e | Gets two parameters, `error` and `items`. If the operation was successful, `error` will be `null` and `items` will be an array of QueueItems. |\n\n\nThe object comparator can also contain other objects, so you may filter queue items based on properties in their `stateData` object as well.\n\n```js\ncrawler.queue.filterItems({\n    stateData: { code: 301 }\n}, function(error, items) {\n    console.log(\"These items returned a 301 HTTP status\", items);\n});\n```\n\n### Saving and reloading the queue (freeze/defrost)\n\nIt can be convenient to be able to save the crawl progress and later be able to reload it if your application fails or you need to abort the crawl for some reason. The `crawler.queue.freeze` and `crawler.queue.defrost` methods will let you do this.\n\n**A word of warning** - they are not CPU friendly as they rely on `JSON.parse` and `JSON.stringify`. Use them only when you need to save the queue - don't call them after every request or your application's performance will be incredibly poor - they block like *crazy*. That said, using them when your crawler commences and stops is perfectly reasonable.\n\nNote that the methods themselves are asynchronous, so if you are going to exit the process after you do the freezing, make sure you wait for callback - otherwise you'll get an empty file.\n\n\u003ca name=\"FetchQueue+freeze\"\u003e\u003c/a\u003e\n\n#### fetchQueue.freeze(filename, callback)\nWrites the queue to disk in a JSON file. This file can later be imported\nusing [defrost](#FetchQueue+defrost)\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| filename | \u003ccode\u003eString\u003c/code\u003e | Filename passed directly to [fs.writeFile](https://nodejs.org/api/fs.html#fs_fs_writefile_file_data_options_callback) |\n| callback | \u003ccode\u003efunction\u003c/code\u003e | Gets a single `error` parameter. If the operation was successful, this parameter will be `null`. |\n\n\u003ca name=\"FetchQueue+defrost\"\u003e\u003c/a\u003e\n\n#### fetchQueue.defrost(filename, callback)\nImport the queue from a frozen JSON file on disk.\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| filename | \u003ccode\u003eString\u003c/code\u003e | Filename passed directly to [fs.readFile](https://nodejs.org/api/fs.html#fs_fs_readfile_file_options_callback) |\n| callback | \u003ccode\u003efunction\u003c/code\u003e | Gets a single `error` parameter. If the operation was successful, this parameter will be `null`. |\n\n\n## Cookies\n\nsimplecrawler has an internal cookie jar, which collects and resends cookies automatically and by default. If you want to turn this off, set the \u003ccode\u003e\u003ca href=\"#Crawler+acceptCookies\"\u003ecrawler.acceptCookies\u003c/a\u003e\u003c/code\u003e option to `false`. The cookie jar is accessible via \u003ccode\u003e\u003ca href=\"#Crawler+cookies\"\u003ecrawler.cookies\u003c/a\u003e\u003c/code\u003e, and is an event emitter itself.\n\n### Cookie events\n\n\u003ca name=\"CookieJar+event_addcookie\"\u003e\u003c/a\u003e\n\n#### \"addcookie\" (cookie)\nFired when a cookie has been added to the jar\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| cookie | [\u003ccode\u003eCookie\u003c/code\u003e](#Cookie) | The cookie that has been added |\n\n\u003ca name=\"CookieJar+event_removecookie\"\u003e\u003c/a\u003e\n\n#### \"removecookie\" (cookie)\nFired when one or multiple cookie have been removed from the jar\n\n\n| Param | Type | Description |\n| --- | --- | --- |\n| cookie | [\u003ccode\u003eArray.\u0026lt;Cookie\u0026gt;\u003c/code\u003e](#Cookie) | The cookies that have been removed |\n\n\n## Link Discovery\n\nsimplecrawler's discovery function is made to be replaceable — you can easily write your own that discovers only the links you're interested in.\n\nThe method must accept a buffer and a [`queueItem`](#queue-items), and return the resources that are to be added to the queue.\n\nIt is quite common to pair simplecrawler with a module like [cheerio](https://npmjs.com/package/cheerio) that can correctly parse HTML and provide a DOM like API for querying — or even a whole headless browser, like phantomJS.\n\nThe example below demonstrates how one might achieve basic HTML-correct discovery of only link tags using cheerio.\n\n```js\ncrawler.discoverResources = function(buffer, queueItem) {\n    var $ = cheerio.load(buffer.toString(\"utf8\"));\n\n    return $(\"a[href]\").map(function () {\n        return $(this).attr(\"href\");\n    }).get();\n};\n```\n\n## FAQ/Troubleshooting\n\nThere are a couple of questions that pop up more often than others in the issue tracker. If you're having trouble with simplecrawler, please have a look at the list below before submitting an issue.\n\n- **Q: Why does simplecrawler discover so many invalid URLs?**\n\n    A: simplecrawler's built-in discovery method is purposefully naive - it's a brute force approach intended to find everything: URLs in comments, binary files, scripts, image EXIF data, inside CSS documents, and more — useful for archiving and use cases where it's better to have false positives than fail to discover a resource.\n\n    It's definitely not a solution for every case, though — if you're writing a link checker or validator, you don't want erroneous 404s throwing errors. Therefore, simplecrawler allows you to tune discovery in a few key ways:\n\n    - You can either add to (or remove from) the \u003ccode\u003e\u003ca href=\"#Crawler+discoverRegex\"\u003ecrawler.discoverRegex\u003c/a\u003e\u003c/code\u003e array, tweaking the search patterns to meet your requirements; or\n    - Swap out the `discoverResources` method. Parsing HTML pages is beyond the scope of simplecrawler, but it is very common to combine simplecrawler with a module like [cheerio](https://npmjs.com/package/cheerio) for more sophisticated resource discovery.\n\n    Further documentation is available in the [link discovery](#link-discovery) section.\n\n- **Q: Why did simplecrawler complete without fetching any resources?**\n\n    A: When this happens, it is usually because the initial request was redirected to a different domain that wasn't in the \u003ccode\u003e\u003ca href=\"#Crawler+domainWhitelist\"\u003ecrawler.domainWhitelist\u003c/a\u003e\u003c/code\u003e.\n\n- **Q: How do I crawl a site that requires a login?**\n\n    A: Logging in to a site is usually fairly simple and most login procedures look alike. We've included an example that covers a lot of situations, but sadly, there isn't a one true solution for how to deal with logins, so there's no guarantee that this code works right off the bat.\n\n    What we do here is:\n    1. fetch the login page,\n    2. store the session cookie assigned to us by the server,\n    3. extract any CSRF tokens or similar parameters required when logging in,\n    4. submit the login credentials.\n\n    ```js\n    var Crawler = require(\"simplecrawler\"),\n        url = require(\"url\"),\n        cheerio = require(\"cheerio\"),\n        request = require(\"request\");\n\n    var initialURL = \"https://example.com/\";\n\n    var crawler = new Crawler(initialURL);\n\n    request(\"https://example.com/login\", {\n        // The jar option isn't necessary for simplecrawler integration, but it's\n        // the easiest way to have request remember the session cookie between this\n        // request and the next\n        jar: true\n    }, function (error, response, body) {\n        // Start by saving the cookies. We'll likely be assigned a session cookie\n        // straight off the bat, and then the server will remember the fact that\n        // this session is logged in as user \"iamauser\" after we've successfully\n        // logged in\n        crawler.cookies.addFromHeaders(response.headers[\"set-cookie\"]);\n\n        // We want to get the names and values of all relevant inputs on the page,\n        // so that any CSRF tokens or similar things are included in the POST\n        // request\n        var $ = cheerio.load(body),\n            formDefaults = {},\n            // You should adapt these selectors so that they target the\n            // appropriate form and inputs\n            formAction = $(\"#login\").attr(\"action\"),\n            loginInputs = $(\"input\");\n\n        // We loop over the input elements and extract their names and values so\n        // that we can include them in the login POST request\n        loginInputs.each(function(i, input) {\n            var inputName = $(input).attr(\"name\"),\n                inputValue = $(input).val();\n\n            formDefaults[inputName] = inputValue;\n        });\n\n        // Time for the login request!\n        request.post(url.resolve(initialURL, formAction), {\n            // We can't be sure that all of the input fields have a correct default\n            // value. Maybe the user has to tick a checkbox or something similar in\n            // order to log in. This is something you have to find this out manually\n            // by logging in to the site in your browser and inspecting in the\n            // network panel of your favorite dev tools what parameters are included\n            // in the request.\n            form: Object.assign(formDefaults, {\n                username: \"iamauser\",\n                password: \"supersecretpw\"\n            }),\n            // We want to include the saved cookies from the last request in this\n            // one as well\n            jar: true\n        }, function (error, response, body) {\n            // That should do it! We're now ready to start the crawler\n            crawler.start();\n        });\n    });\n\n    crawler.on(\"fetchcomplete\", function (queueItem, responseBuffer, response) {\n        console.log(\"Fetched\", queueItem.url, responseBuffer.toString());\n    });\n    ```\n\n- **Q: What does it mean that events are asynchronous?**\n\n    A: One of the core concepts of node.js is its asynchronous nature. I/O operations (like network requests) take place outside of the main thread (which is where your code is executed). This is what makes node fast, the fact that it can continue executing code while there are multiple HTTP requests in flight, for example. But to be able to get back the result of the HTTP request, we need to register a function that will be called when the result is ready. This is what *asynchronous* means in node - the fact that code can continue executing while I/O operations are in progress - and it's the same concept as with AJAX requests in the browser.\n\n- **Q: Promises are nice, can I use them with simplecrawler?**\n\n    A: No, not really. Promises are meant as a replacement for callbacks, but simplecrawler is event driven, not callback driven. Using callbacks to any greater extent in simplecrawler wouldn't make much sense, since you normally need to react more than once to what happens in simplecrawler.\n\n- **Q: Something's happening and I don't see the output I'm expecting!**\n\n    Before filing an issue, check to see that you're not just missing something by logging *all* crawler events with the code below:\n\n    ```js\n    var originalEmit = crawler.emit;\n    crawler.emit = function(evtName, queueItem) {\n        crawler.queue.countItems({ fetched: true }, function(err, completeCount) {\n            if (err) {\n                throw err;\n            }\n\n            crawler.queue.getLength(function(err, length) {\n                if (err) {\n                    throw err;\n                }\n\n                console.log(\"fetched %d of %d — %d open requests, %d open listeners\",\n                    completeCount,\n                    length,\n                    crawler._openRequests.length,\n                    crawler._openListeners);\n            });\n        });\n\n        console.log(evtName, queueItem ? queueItem.url ? queueItem.url : queueItem : null);\n        originalEmit.apply(crawler, arguments);\n    };\n    ```\n\n    If you don't see what you need after inserting that code block, and you still need help, please attach the output of all the events fired with your email/issue.\n\n## Node Support Policy\n\nSimplecrawler will officially support stable and LTS versions of Node which are currently supported by the Node Foundation.\n\nCurrently supported versions:\n\n- 8.x\n- 10.x\n- 12.x\n\n## Current Maintainers\n\n* [Christopher Giffard](https://github.com/cgiffard)\n* [Fredrik Ekelund](https://github.com/fredrikekelund)\n* [Konstantin Bläsi](https://github.com/konstantinblaesi)\n* [XhmikosR](https://github.com/XhmikosR)\n\n## Contributing\n\nPlease see the [contributor guidelines](https://github.com/simplecrawler/simplecrawler/blob/master/CONTRIBUTING.md) before submitting a pull request to ensure that your contribution is able to be accepted quickly and easily!\n\n## Contributors\n\nsimplecrawler has benefited from the kind efforts of dozens of contributors, to whom we are incredibly grateful. We originally listed their individual contributions but it became pretty unwieldy - the [full list can be found here.](https://github.com/simplecrawler/simplecrawler/graphs/contributors)\n\n## License\n\nCopyright (c) 2017, Christopher Giffard.\n\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or without modification,\nare permitted provided that the following conditions are met:\n\n* Redistributions of source code must retain the above copyright notice, this\n  list of conditions and the following disclaimer.\n* Redistributions in binary form must reproduce the above copyright notice, this\n  list of conditions and the following disclaimer in the documentation and/or\n  other materials provided with the distribution.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND\nANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED\nWARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\nDISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR\nANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES\n(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;\nLOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON\nANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\n(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS\nSOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimplecrawler%2Fsimplecrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimplecrawler%2Fsimplecrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimplecrawler%2Fsimplecrawler/lists"}