{"id":18710766,"url":"https://github.com/apify/actor-legacy-phantomjs-crawler","last_synced_at":"2026-02-19T22:33:08.848Z","repository":{"id":44156926,"uuid":"175380998","full_name":"apify/actor-legacy-phantomjs-crawler","owner":"apify","description":"The actor implements the legacy Apify Crawler product. It uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of JavaScript code.","archived":false,"fork":false,"pushed_at":"2023-04-14T06:45:47.000Z","size":1042,"stargazers_count":9,"open_issues_count":15,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-01-13T16:41:27.312Z","etag":null,"topics":["apify","headless-browsers","phantomjs","web-crawler","web-scraping"],"latest_commit_sha":null,"homepage":"https://apify.com/apify/legacy-phantomjs-crawler","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apify.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-03-13T08:46:46.000Z","updated_at":"2025-09-02T16:18:22.000Z","dependencies_parsed_at":"2025-04-12T11:46:51.517Z","dependency_job_id":null,"html_url":"https://github.com/apify/actor-legacy-phantomjs-crawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/apify/actor-legacy-phantomjs-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-legacy-phantomjs-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-legacy-phantomjs-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-legacy-phantomjs-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-legacy-phantomjs-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apify","download_url":"https://codeload.github.com/apify/actor-legacy-phantomjs-crawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-legacy-phantomjs-crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29635613,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-19T22:32:43.237Z","status":"ssl_error","status_checked_at":"2026-02-19T22:32:38.330Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apify","headless-browsers","phantomjs","web-crawler","web-scraping"],"created_at":"2024-11-07T12:35:36.188Z","updated_at":"2026-02-19T22:33:08.817Z","avatar_url":"https://github.com/apify.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Legacy PhantomJS Crawler\n\nThis actor implements the legacy Apify Crawler product.\nIt uses the [PhantomJS](http://phantomjs.org/) headless browser to recursively\ncrawl websites and extract data from them using front-end JavaScript code.\n\nNote that PhantomJS is no longer being developed by the community\nand might be easily detected and blocked by target websites.\nTherefore, for new projects, we recommend that you use the Web Scraper\n([`apify/web-scraper`](https://apify.com/apify/web-scraper)) actor,\nwhich provides similar functionality, but is based on the modern headless Chrome browser.\n\nFor more details on how to migrate your crawlers to this actor,\nplease read this \u003ca href=\"https://blog.apify.com/apify-crawler-to-be-replaced-by-apify-actors-c67df1366e00\"\u003eblog post\u003c/a\u003e.\n\n## Compatibility with legacy Apify Crawler\n\nApify Crawler used to be a core product of Apify, but in April 2019 it was deprecated in favor of the more general\n[Apify Actors](https://apify.com/actors) product.\nThis actor serves as a replacement of the legacy product and provides equivalent interface and functionality,\nin order to enable users to seamlessly migrate their crawlers.\nNote that there are several differences between this actor and legacy Apify Crawler:\n\n- The **Cookies persistence** setting of **Over all crawler runs**\n  is only supported when running the actor as a [task](https://docs.apify.com/tasks).\n  When you run the actor directly and use this setting,\n  the actor will fail and print an error to the log.\n- In **Page function**, the `context` object passed to the function has slightly different properties:\n  - The `stats` object contains only a subset of the original statistics.\n  See `context` details in [Page function](#page-function) section.\n  - The `actExecutionId` and `actId` properties are not defined and were replaced by `actorRunId` and `actorTaskId`, respectively.\n- The **Finish webhook URL** and **Finish webhook data** fields are still supported.\n  However, the POST payload passed to the webhook has a different format.\n  See [Finish webhook](#finish-webhook) below for details.\n- The actor supports legacy **proxy settings** fields `proxyType`, `proxyGroups` and `customProxies`,\n  but their values are not checked. If these settings are invalid,\n  the actor will start normally and might crawl pages with invalid proxy settings,\n  most likely producing invalid results.\n  It is recommended to use the new **Proxy configuration** (`proxyConfiguration`)\n  field instead, which is correctly validated before the actor is started.\n  Beware that **Custom proxies** in the new **Proxy configuration** no longer support SOCKS5 proxies,\n  and they only accept HTTP proxies. If you need SOCKS5,\n  please contact [support@apify.com](mailto:support@apify.com)  \n- The **Test URL** feature is not supported.\n- The crawling results are stored into an Apify dataset instead of the specialized\n  storage for crawling results used by the old Crawler.\n  The dataset supports most of the legacy API query parameters\n  in order to emulate the same results format. However, there might be some small\n  incompatibilities. For details, see [Crawling results](#crawling-results).\n\n## Overview\n\nThis actor provides a web crawler for developers that enables the scraping of data from\nany website using the primary programming language of the web, JavaScript.\n\nIn order to extract structured data from a website, you only need two things. First, tell the crawler which pages it\nshould visit (see \u003ca href=\"#start-urls\"\u003eStart URLs\u003c/a\u003e and \u003ca href=\"#pseudo-urls\"\u003ePseudo-URLs\u003c/a\u003e) and second, define\na JavaScript code that will be executed on every web page visited in order to extract the data from it\n(see \u003ca href=\"#page-function\"\u003ePage function\u003c/a\u003e).\nThe crawler is a full-featured web browser which loads and interprets JavaScript and the code you provide is simply\nexecuted in the context of the pages it visits. This means that writing your data-extraction code is very similar\nto writing JavaScript code in front-end development, you can even use any client-side libraries such as\n\u003ca href=\"http://jquery.com\" target=\"_blank\" rel=\"noopener\"\u003ejQuery\u003c/a\u003e or\n\u003ca href=\"http://underscorejs.org\" target=\"_blank\" rel=\"noopener\"\u003eUnderscore.js\u003c/a\u003e.\n\nImagine the crawler as a guy sitting in front of a web browser. Let's call him Bob. Bob opens a start URL and waits\nfor the page to load, executes your JavaScript code using a developer console, writes down the result and then\nright-clicks all links on the web page to open them in new browser tabs.\nAfter that, Bob closes the current tab, goes to the next tab and repeats the same action again.\nBob is pretty smart and skips pages that he has already visited.\nWhen there are no more pages, he is done. And this is where the magic happens.\nBob would need about a month to click through a few hundred pages.\nApify can do it in a few seconds and makes fewer mistakes than Bob.\n\nMore formally, the crawler repeats the following steps:\n\n\u003col\u003e\n    \u003cli\u003eAdd each of the \u003ca href=\"#start-urls\"\u003eStart URLs\u003c/a\u003e to the crawling queue.\u003c/li\u003e\n    \u003cli\u003eFetch the first URL from the queue and load it in the virtual browser.\u003c/li\u003e\n    \u003cli\u003eExecute \u003ca href=\"#page-function\"\u003ePage function\u003c/a\u003e on the loaded page and save its results.\u003c/li\u003e\n    \u003cli\u003eFind all links from the page using \u003ca href=\"#clickable-elements\"\u003eClickable elements\u003c/a\u003e CSS selector.\n        If a link matches any of the \u003ca href=\"#pseudo-urls\"\u003ePseudo-URLs\u003c/a\u003e and has not yet been enqueued, add it to the queue.\u003c/li\u003e\n    \u003cli\u003eIf there are more items in the queue, go to step 2, otherwise finish.\u003c/li\u003e\n\u003c/ol\u003e\n\nThis process is depicted in the following diagram.\nNote that blue elements represent settings or operations that can be affected by crawler settings.\nThese settings are described in detail in the following sections.\n\n\u003ccenter\u003e\n    \u003ca href=\"https://raw.githubusercontent.com/apifytech/actor-legacy-phantomjs-crawler/master/img/crawler-activity-diagram.001.png\" target=\"_blank\" rel=\"noopener\"\u003e\u003cimg\n        src=\"https://raw.githubusercontent.com/apifytech/actor-legacy-phantomjs-crawler/master/img/crawler-activity-diagram.001.png\" alt=\"Web crawler activity diagram\"\n        class=\"img-responsive\"/\u003e\u003c/a\u003e\n\u003c/center\u003e\n\nNote that each crawler configuration setting can also be set using the API. The corresponding property names and types are\ndescribed in the [Input schema](https://apify.com/apify/web-scraper?section=input-schema) section.\n\n## Start URLs\n\nThe **Start URLs** (`startUrls`) field represent the list of URLs of the first pages that the crawler will open.\nOptionally, each URL can be associated with a custom label that can be referenced from\nyour JavaScript code to determine which page is currently open\n(see \u003ca href=\"#request-object\"\u003eRequest object\u003c/a\u003e for details).\nEach URL must start with either a `http://` or `https://` protocol prefix!\n\nNote that it is possible to instruct the crawler to load a URL using a HTTP POST request\nsimply by suffixing it with a `[POST]` marker, optionally followed by\nPOST data (e.g. `http://www.example.com[POST]\u003cwbr\u003ekey1=value1\u0026key2=value2`).\nBy default, POST requests are sent with\nthe `Content-Type: application/x-www-form-urlencoded` header.\n\nMaximum label length is 100 characters and maximum URL length is 2000 characters.\n\n## Pseudo-URLs\n\nThe **Pseudo-URLs** (`crawlPurls`) field specifies which pages will be visited by the crawler using\nthe so-called \u003ci\u003epseudo-URLs\u003c/i\u003e (PURL)\nformat. PURL is simply a URL with special directives enclosed in `[]` brackets.\nCurrently, the only supported directive is `[regexp]`, which defines\na JavaScript-style regular expression to match against the URL.\n\nFor example, a PURL `http://www.example.com/pages/[(\\w|-)*]` will match all of the\nfollowing URLs:\n\n- `http://www.example.com/pages/`\n- `http://www.example.com/pages/my-awesome-page`\n- `http://www.example.com/pages/something`\n\nIf either `[` or `]` is part of the normal query string,\nit must be encoded as `[\\x5B]` or `[\\x5D]`, respectively. For example,\nthe following PURL:\n\n```\nhttp://www.example.com/search?do[\\x5B]load[\\x5D]=1\n```\n\nwill match the URL:\n\n```\nhttp://www.example.com/search?do[load]=1\n```\n\nOptionally, each PURL can be associated with a custom label that can be referenced from\nyour JavaScript code to determine which page is currently open\n(see \u003ca href=\"#request-object\"\u003eRequest object\u003c/a\u003e for details).\n\nNote that you don't need to use this setting at all,\nbecause you can completely control which pages the crawler will access, either using the\n\u003ca href=\"#intercept-request-function\"\u003eIntercept request function\u003c/a\u003e\nor by calling `context.enqueuePage()` inside the \u003ca href=\"#page-function\"\u003ePage function\u003c/a\u003e.\n\nMaximum label length is 100 characters\nand maximum PURL length is 1000 characters.\n\n## Clickable elements\n\nThe **Clickable elements** (`clickableElementsSelector`) field contains a CSS selector used to find links to other web pages.\nOn each page, the crawler clicks all DOM elements matching this selector\nand then monitors whether the page generates a navigation request.\nIf a navigation request is detected, the crawler checks whether it matches\n\u003ca href=\"#pseudo-urls\"\u003ePseudo-URLs\u003c/a\u003e,\ninvokes \u003ca href=\"#intercept-request-function\"\u003eIntercept request function\u003c/a\u003e,\ncancels the request and then continues clicking the next matching elements.\nBy default, new crawlers are created with a safe CSS selector:\n\n```\na:not([rel=nofollow])\n```\n\nIn order to reach more pages, you might want to use a wider CSS selector, such as:\n\n```\na:not([rel=nofollow]), input, button, [onclick]:not([rel=nofollow])\n```\n\n\nBe careful - clicking certain DOM elements can cause\n\u003cb\u003eunexpected and potentially harmful side effects\u003c/b\u003e.\nFor example, by clicking buttons, you might submit forms, flag comments, etc.\nIn principle, the safest option is to narrow the CSS selector to as few elements as possible,\nwhich also makes the crawler run much faster.\n\nLeave this field empty if you do not want the crawler to click any elements and only open\n\u003ca href=\"#start-urls\"\u003eStart URLs\u003c/a\u003e\nor pages enqueued using \u003ccode\u003eenqueuePage()\u003c/code\u003e.\n\n\n## Page function\n\nThe **Page function** (`pageFunction`) field contains\na user-provided JavaScript function that is executed in the context of every web page loaded by\nthe crawler.\nPage function is typically used to extract some data from the page, but it can also be used\nto perform some non-trivial\noperation on the page, e.g. handle AJAX-based pagination.\n\n\u003cb\u003eIMPORTANT:\u003c/b\u003e This actor uses the \u003ca href=\"http://phantomjs.org/\" target=\"_blank\" rel=\"noopener\"\u003ePhantomJS\u003c/a\u003e\nheadless web browser, which only supports JavaScript ES5.1 standard\n(read more in a \u003ca href=\"https://ariya.io/2014/08/phantomjs-2-and-javascript-goodies\" target=\"_blank\" rel=\"noopener\"\u003eblog post about PhantomJS 2.0\u003c/a\u003e).\n\nThe basic page function with no effect has the following signature:\n\n```javascript\nfunction pageFunction(context) {\n    return null;\n}\n```\n\nThe function can return an arbitrary JavaScript object (including array, string, number, etc.) that can be stringified to JSON;\nthis value will be saved in the crawling results, as the \u003ccode\u003epageFunctionResult\u003c/code\u003e\nfield of the \u003ca href=\"#request-object\"\u003eRequest object\u003c/a\u003e corresponding to the web page\non which the \u003ccode\u003epageFunction\u003c/code\u003e was executed.\nThe crawling results are stored in the default [dataset](https://docs.apify.com/storage#dataset)\nassociated with the actor run, from where they can be downloaded\nin a computer-friendly form (JSON, JSONL, XML or RSS format),\nas well as in a human-friendly tabular form (HTML or CSV format).\nIf the \u003ccode\u003epageFunction\u003c/code\u003e's return value is an array,\nits elements can be displayed as separate rows in such a table,\nto make the results more readable.\n\nThe function accepts a single argument called \u003ccode\u003econtext\u003c/code\u003e,\nwhich is an object with the following properties and functions:\n\n\u003ctable class=\"table table-bordered\"\u003e\n    \u003cthead\u003e\n    \u003ctr\u003e\n        \u003cth\u003eName\u003c/th\u003e\n        \u003cth\u003eDescription\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003c/thead\u003e\n    \u003ctbody\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-request\"\u003e\u003ccode\u003erequest\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eAn object holding all the available information about the currently loaded web page.\n            See \u003ca href=\"#request-object\"\u003eRequest object\u003c/a\u003e for details.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-jQuery\"\u003e\u003ccode\u003ejQuery\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eA jQuery object, only available if the\n            \u003cstrong\u003eInject jQuery\u003c/strong\u003e\n            setting is\n            enabled.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-underscoreJs\"\u003e\u003ccode\u003eunderscoreJs\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eThe Underscore.js' \u003ccode\u003e_\u003c/code\u003e object, only available if the\n            \u003cstrong\u003eInject Underscore.js\u003c/strong\u003e\n            setting is enabled.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-skipLinks\"\u003e\u003ccode\u003eskipLinks()\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eIf called, the crawler will not follow any links from the current page and will\n            continue with the next page from the queue.\n            This is useful to speed up the crawl by avoiding unnecessary paths.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-skipOutput\"\u003e\u003ccode\u003eskipOutput()\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eIf called, no information about the current page will be saved to the results,\n            including the page function result itself.\n            This is useful to reduce the size of the results by skipping unimportant pages.\n            Note that if the page function throws an exception, the \u003ccode\u003eskipOutput()\u003c/code\u003e\n            call is ignored and the page is outputted anyway, so that the user has a chance\n            to determine whether there was an error\n            (see \u003ca href=\"#request-object\"\u003eRequest object\u003c/a\u003e's \u003ccode\u003eerrorInfo\u003c/code\u003e\n            field).\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-willFinishLater\"\u003e\u003ccode\u003ewillFinishLater()\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eTells the crawler that the page function will continue performing some background\n            operation even after it returns. This is useful\n            when you want to fetch results from an asynchronous operation,\n            e.g. an XHR request or a click on some DOM element.\n            If you use the \u003ccode\u003ewillFinishLater()\u003c/code\u003e function, make sure you also invoke \u003ccode\u003efinish()\u003c/code\u003e\n            or the crawler will wait infinitely for the result and eventually timeout\n            after the period specified in\n            \u003cstrong\u003ePage function timeout\u003c/strong\u003e.\n            Note that the normal return value of the page function is ignored.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-finish\"\u003e\u003ccode\u003efinish(result)\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eTells the crawler that the page function finished its background operation.\n            The \u003ccode\u003eresult\u003c/code\u003e parameter receives the result of the page function - this is\n            a replacement\n            for the normal return value of the page function that was ignored (see \u003ccode\u003ewillFinishLater()\u003c/code\u003e above).\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-saveSnapshot\"\u003e\u003ccode\u003esaveSnapshot() \u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eCaptures a screenshot of the web page and saves its DOM to an HTML file,\n            which are both then displayed in the user's crawling console.\n            This is especially useful for debugging your page function.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-enqueuePage\"\u003e\u003ccode\u003eenqueuePage(request)\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003e\n            \u003cp\u003e\n            Adds a new page request to the crawling queue, regardless of whether it matches\n            any of the \u003ca href=\"#pseudo-urls\"\u003ePseudo-URLs\u003c/a\u003e.\n            The \u003ccode\u003erequest\u003c/code\u003e argument is an instance of the \u003ca href=\"#request-object\"\u003eRequest object\u003c/a\u003e,\n            but only the following properties are taken into account:\n            \u003ccode\u003eurl\u003c/code\u003e, \u003ccode\u003euniqueKey\u003c/code\u003e, \u003ccode\u003elabel\u003c/code\u003e,\n            \u003ccode\u003emethod\u003c/code\u003e, \u003ccode\u003epostData\u003c/code\u003e, \u003ccode\u003econtentType\u003c/code\u003e,\n            \u003ccode\u003equeuePosition\u003c/code\u003e and \u003ccode\u003einterceptRequestData\u003c/code\u003e; all other properties\n            will be ignored. The \u003ccode\u003eurl\u003c/code\u003e property is mandatory.\n            \u003c/p\u003e\n            \u003cp\u003e\n            Note that the manually enqueued page is subject to the same processing\n            as any other page found by the crawler. For example,\n            the \u003ca href=\"#intercept-request-function\"\u003eIntercept request function\u003c/a\u003e function\n            will be called for the new request, and the page will be checked to see whether it has\n            already been visited by the crawler and skipped if so.\n            \u003c/p\u003e\n            For backwards compatibility, the function also supports the following signature:\n            \u003ccode\u003eenqueuePage(url, method, postData, contentType)\u003c/code\u003e.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-saveCookies\"\u003e\u003ccode\u003esaveCookies([cookies]) \u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eSaves the current cookies of the current PhantomJS browser to the actor task's\n        \u003ca href=\"#cookies\"\u003eInitial cookies\u003c/a\u003e setting.\n        All subsequently started PhantomJS processes will use these cookies.\n        For example, this is useful for storing a login.\n        Optionally, you can pass an array of cookies to set to the browser before saving (in\n        \u003ca href=\"http://phantomjs.org/api/phantom/property/cookies.html\" target=\"_blank\" rel=\"noopener\"\u003ePhantomJS format\u003c/a\u003e).\n        Note that by passing an empty array you can unset all cookies.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-customData\"\u003e\u003ccode\u003ecustomData\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eCustom user data from crawler settings provided via \u003ccode\u003ecustomData\u003c/code\u003e input field.\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-stats\"\u003e\u003ccode\u003estats\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eAn object containing a snapshot of statistics from the current crawl.\n            It contains the following fields:\n            \u003ccode\u003epagesCrawled\u003c/code\u003e, \u003ccode\u003epagesOutputted\u003c/code\u003e\n            and \u003ccode\u003epagesInQueue\u003c/code\u003e.\n            Note that the statistics are collected \u003cb\u003ebefore\u003c/b\u003e\n            the current page has been crawled.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-actorRunId\"\u003e\u003ccode\u003eactorRunId\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eString containing ID of this actor run. It might be used to control\n            the actor using the \u003ca href=\"https://docs.apify.com/api/v2\"\u003eAPI\u003c/a\u003e,\n            e.g. to stop it or fetch its results.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd id=\"context-actorRunId\"\u003e\u003ccode\u003eactorTaskId\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eString containing ID of the actor task, or \u003ccode\u003enull\u003c/code\u003e if actor is run directly.\n            The ID might be used to control\n            the task using the \u003ca href=\"https://docs.apify.com/api/v2\"\u003eAPI\u003c/a\u003e.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003c/tbody\u003e\n\u003c/table\u003e\n\nNote that any changes made to the \u003ccode\u003econtext\u003c/code\u003e parameter will be ignored.\nWhen implementing the page function, it is the user's responsibility to not break normal\npage scripts that might affect the operation of the crawler.\n\n### Waiting for dynamic content\n\nSome web pages do not load all their content immediately, but only fetch it in the background\nusing AJAX,\nwhile \u003ccode\u003epageFunction\u003c/code\u003e might be executed before the content has actually been\nloaded.\nYou can wait for dynamic content to load using the following code:\n\n```javascript\nfunction pageFunction(context) {\n    var $ = context.jQuery;\n    var startedAt = Date.now();\n\n    var extractData = function() {\n        // timeout after 10 seconds\n        if( Date.now() - startedAt \u003e 10000 ) {\n            context.finish(\"Timed out before #my_element was loaded\");\n            return;\n        }\n\n        // if my element still hasn't been loaded, wait a little more\n        if( $('#my_element').length === 0 ) {\n            setTimeout(extractData, 500);\n            return;\n        }\n\n        // refresh page screenshot and HTML for debugging\n        context.saveSnapshot();\n\n        // save a result\n        context.finish({\n            value: $('#my_element').text()\n        });\n    };\n\n    // tell the crawler that pageFunction will finish asynchronously\n    context.willFinishLater();\n\n    extractData();\n}\n```\n\n## Intercept request function\n\nThe **Intercept request function** (`interceptRequest`) field contains\na user-provided JavaScript function that is called whenever\na new URL is about to be added to the crawling queue,\nwhich happens at the following times:\n\n- At the start of crawling for all \u003ca href=\"#start-urls\"\u003eStart URLs.\u003c/a\u003e\n- When the crawler looks for links to new pages by clicking elements\n  matching the \u003ca href=\"#clickable-elements\"\u003eClickable elements\u003c/a\u003e\n  CSS selector and detects a page navigation request, i.e. a link (GET)\n  or a form submission (POST) that would normally cause the browser to navigate to a new web page.\n- Whenever a loaded page tries to navigate to another page, e.g. by setting \u003ccode\u003ewindow.location\u003c/code\u003e in JavaScript.\n- When user code invokes \u003ccode\u003eenqueuePage()\u003c/code\u003e inside of \u003ca href=\"#page-function\"\u003ePage function\u003c/a\u003e.\n\nThe intercept request function allows you to affect on a low level\nhow new pages are enqueued by the crawler.\nFor example, it can be used to ensure that the request is added to the crawling queue even\nif it doesn't match\nany of the \u003ca href=\"#pseudo-urls\"\u003ePseudo-URLs\u003c/a\u003e,\nor to change the way the crawler determines whether the page has already been visited or not.\nSimilarly to the \u003ca href=\"#page-function\"\u003ePage function\u003c/a\u003e,\nthis function is executed in the context of the originating web page (or in the context\nof \u003ccode\u003eabout:blank\u003c/code\u003e page for \u003ca href=\"#start-urls\"\u003eStart URLs\u003c/a\u003e).\n\n\u003cb\u003eIMPORTANT:\u003c/b\u003e This actor is using \u003ca href=\"http://phantomjs.org/\" target=\"_blank\" rel=\"noopener\"\u003ePhantomJS\u003c/a\u003e\nheadless web browser, which only supports the JavaScript ES5.1 standard\n(read more in \u003ca href=\"https://ariya.io/2014/08/phantomjs-2-and-javascript-goodies\" target=\"_blank\" rel=\"noopener\"\u003eblog post about PhantomJS 2.0\u003c/a\u003e).\n\nThe basic intercept request function with no effect has the following signature:\n\n```javascript\nfunction interceptRequest(context, newRequest) {\n    return newRequest;\n}\n```\n\nThe \u003ccode\u003econtext\u003c/code\u003e is an object with the following properties:\n\n\u003ctable class=\"table table-bordered table-condensed\"\u003e\n    \u003ctbody\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e\u003ccode\u003erequest\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eAn object holding all the available information about the currently loaded web page.\n            See \u003ca href=\"#request-object\"\u003eRequest object\u003c/a\u003e for details.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e\u003ccode\u003ejQuery\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eA \u003ca href=\"http://api.jquery.com/jQuery/\" target=\"_blank\" rel=\"noopener\"\u003ejQuery\u003c/a\u003e object, only\n            available if the\n            \u003cstrong\u003eInject jQuery\u003c/strong\u003e\n            setting is\n            enabled.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e\u003ccode\u003eunderscoreJs\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eAn \u003ca href=\"http://underscorejs.org/\" target=\"_blank\" rel=\"noopener\"\u003eUnderscore.js\u003c/a\u003e object, only\n            available if the\n            \u003cstrong\u003eInject Underscore.js\u003c/strong\u003e\n            setting is enabled.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e\u003ccode\u003eclickedElement\u003c/code\u003e\u003c/td\u003e\n        \u003ctd\u003eA reference to the DOM object whose clicking initiated the current navigation\n            request.\n            The value is \u003ccode\u003enull\u003c/code\u003e if the navigation request was initiated by other\n            means,\n            e.g. using some background JavaScript action.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003c/tbody\u003e\n\u003c/table\u003e\n\nBeware that in rare situations when the page redirects in its JavaScript before it was\ncompletely loaded\nby the crawler, the \u003ccode\u003ejQuery\u003c/code\u003e and \u003ccode\u003eunderscoreJs\u003c/code\u003e objects will be undefined.\nThe \u003ccode\u003enewRequest\u003c/code\u003e parameter contains a \u003ca href=\"#request-object\"\u003eRequest object\u003c/a\u003e\ncorresponding to the new page.\n\nThe way the crawler handles the new page navigation request depends\non the return value of the \u003ccode\u003einterceptRequest\u003c/code\u003e function in the following way:\n\n\u003cul\u003e\n    \u003cli\u003eIf function returns the \u003ccode\u003enewRequest\u003c/code\u003e object unchanged,\n        the default crawler behaviour will apply.\n    \u003c/li\u003e\n    \u003cli\u003eIf function returns the \u003ccode\u003enewRequest\u003c/code\u003e object altered, the crawler\n        behavior will be modified, e.g. it will enqueue a page that would not normally be skipped.\n        The following fields can be altered:\n        \u003ccode\u003ewillLoad\u003c/code\u003e, \u003ccode\u003eurl\u003c/code\u003e, \u003ccode\u003emethod\u003c/code\u003e, \u003ccode\u003epostData\u003c/code\u003e,\n        \u003ccode\u003econtentType\u003c/code\u003e,\n        \u003ccode\u003euniqueKey\u003c/code\u003e, \u003ccode\u003elabel\u003c/code\u003e, \u003ccode\u003einterceptRequestData\u003c/code\u003e\n        and \u003ccode\u003equeuePosition\u003c/code\u003e\n        (see \u003ca href=\"#request-object\"\u003eRequest object\u003c/a\u003e for details).\n    \u003c/li\u003e\n    \u003cli\u003eIf function returns \u003ccode\u003enull\u003c/code\u003e, the request will be dropped and a new page will not\n        be enqueued.\n    \u003c/li\u003e\n    \u003cli\u003eIf function throws an exception, the default crawler behaviour will apply\n        and the error will be logged to Request object's \u003ccode\u003eerrorInfo\u003c/code\u003e field.\n        Note that this is the only way a user can catch and debug such an exception.\n    \u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003e\n    Note that any changes made to the \u003ccode\u003econtext\u003c/code\u003e parameter will be ignored\n    (unlike the \u003ccode\u003enewRequest\u003c/code\u003e parameter).\n    When implementing the function, it is the user's responsibility not to break normal page\n    scripts that might affect the operation of the crawler. You have been warned.\n    Also note that the function does not resolve HTTP redirects: it only reports the originally\n    requested URL, but does not open it to find out which URL it eventually redirects to.\n\u003c/p\u003e\n\n\n## Proxy configuration\n\nThe **Proxy configuration** (`proxyConfiguration`) option enables you to set\nproxies that will be used by the crawler in order to prevent its detection by target websites.\nYou can use both [Apify Proxy](https://apify.com/proxy)\nas well as custom HTTP or SOCKS5 proxy servers.\n\nThe following table lists the available options of the proxy configuration setting:\n\n\u003ctable class=\"table table-bordered table-condensed\"\u003e\n    \u003ctbody\u003e\n    \u003ctr\u003e\n        \u003cth\u003e\u003cb\u003eNone\u003c/b\u003e\u003c/td\u003e\n        \u003ctd\u003e\n            Crawler will not use any proxies.\n            All web pages will be loaded directly from IP addresses of Apify servers running on Amazon Web Services.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth\u003e\u003cb\u003eApify\u0026nbsp;Proxy\u0026nbsp;(automatic)\u003c/b\u003e\u003c/td\u003e\n        \u003ctd\u003e\n            The crawler will load all web pages using \u003ca href=\"https://apify.com/proxy\"\u003eApify Proxy\u003c/a\u003e\n            in the automatic mode. In this mode, the proxy uses all proxy groups\n            that are available to the user, and for each new web page it automatically selects the proxy\n            that hasn't been used in the longest time for the specific hostname,\n            in order to reduce the chance of detection by the website.\n            You can view the list of available proxy groups\n            on the \u003ca href=\"https://my.apify.com/proxy\" target=\"_blank\" rel=\"noopener\"\u003eProxy\u003c/a\u003e page in the app.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth\u003e\u003cb\u003eApify\u0026nbsp;Proxy\u0026nbsp;(selected\u0026nbsp;groups)\u003c/b\u003e\u003c/td\u003e\n        \u003ctd\u003e\n            The crawler will load all web pages using \u003ca href=\"https://apify.com/proxy\"\u003eApify Proxy\u003c/a\u003e\n            with specific groups of target proxy servers.\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth\u003e\u003cb\u003eCustom\u0026nbsp;proxies\u003c/b\u003e\u003c/td\u003e\n        \u003ctd\u003e\n            \u003cp\u003e\n            The crawler will use a custom list of proxy servers.\n            The proxies must be specified in the \u003ccode\u003escheme://user:password@host:port\u003c/code\u003e format,\n            multiple proxies should be separated by a space or new line.\n            The URL scheme can be either \u003ccode\u003ehttp\u003c/code\u003e or \u003ccode\u003esocks5\u003c/code\u003e.\n            User and password might be omitted, but the port must always be present.\n            \u003c/p\u003e\n            \u003cp\u003e\n                Example:\n            \u003c/p\u003e\n            \u003cpre\u003e\u003ccode class=\"language-none\"\u003ehttp://bob:password@proxy1.example.com:8000\nhttp://bob:password@proxy2.example.com:8000\u003c/code\u003e\u003c/pre\u003e\n        \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003c/tbody\u003e\n\u003c/table\u003e\n\nNote that the proxy server used to fetch a specific page\nis stored to the \u003ccode\u003eproxy\u003c/code\u003e field of the \u003ca href=\"#requestObject\"\u003eRequest object\u003c/a\u003e.\nNote that for security reasons, the usernames and passwords are redacted from the proxy URL.\n\nThe proxy configuration can be set programmatically when calling the actor using the API\nby setting the `proxyConfiguration` field.\nIt accepts a JSON object with the following structure:\n\n```javascript\n{\n    // Indicates whether to use Apify Proxy or not.\n    \"useApifyProxy\": Boolean,\n\n    // Array of Apify Proxy groups, only used if \"useApifyProxy\" is true.\n    // If missing or null, Apify Proxy will use the automatic mode.\n    \"apifyProxyGroups\": String[],\n\n    // Array of custom proxy URLs, in \"scheme://user:password@host:port\" format.\n    // If missing or null, custom proxies are not used.\n    \"proxyUrls\": String[],\n}\n```\n\n## Finish webhook\n\nThe **Finish webhook URL** (`finishWebhookUrl`)\nfield specifies a custom HTTP endpoint that receives a notification after a run of the actor ends,\nregardless of its status, i.e. whether it finished, failed, was aborted, etc.\nYou can specify a custom string that will be sent with the webhook\nusing the **Finish webhook data** (`finishWebhookData`),\nin order to help you identify the actor run.\n\nThe provided endpoint is sent a HTTP POST request with `Content-Type: application/json; charset=utf-8` header,\nand its payload contains a JSON object with the following structure:\n\n```\n{\n    \"actorId\": String,   // ID of the actor\n    \"runId\": String,     // ID of the actor run\n    \"taskId\": String,    // ID of the actor task. It might be null if the actor is started directly without a task.\n    \"datasetId\": String, // ID of the dataset with the crawling results\n    \"data\": String       // Custom data provided in \"finishWebhookData\" field\n}\n```\n\nYou can use the `actorId` and `runId` fields to query the actor run status using \nthe [Get run](https://docs.apify.com/api/v2#/reference/actors/run-object/get-run) API endpoint.\nThe `datasetId` field can be used to download the crawling results\nusing the [Get dataset items](https://docs.apify.com/api/v2#/reference/datasets/item-collection/get-items)\nAPI endpoint - see [Crawling results](#crawling-results) below for more details.\n\nNote that the **Finish webhook URL** and **Finish webhook data** are provided merely for backwards compatibility \nwith the legacy Apify Crawler product, and the calls are performed using the Apify platform's standard webhook facility.\nFor details, see [Webhooks](https://docs.apify.com/webhooks) in documentation.\n\nTo test your webhook endpoint, please create a new empty task for this actor,\nset the **Finish webhook URL** and run the task.\n\n\n## Cookies\n\nThe **Initial cookies** (`cookies`) option enables you to specify\na JSON array with cookies that will be used by the crawler on start.\nYou can export the cookies from your own web browser,\nfor example using the \u003ca href=\"http://www.editthiscookie.com/\" target=\"_blank\" rel=\"noopener\"\u003eEditThisCookie\u003c/a\u003e plugin.\nThis setting is typically used to start crawling when logged in to certain websites.\nThe array might be null or empty, in which case the crawler will start with no cookies.\n\nNote that if the \u003ca href=\"#cookies-persistence-option\"\u003eCookie persistence\u003c/a\u003e\nsetting is \u003cb\u003eOver all crawler runs\u003c/b\u003e and the actor is started from within a\n\u003ca href=\"https://docs.apify.com/tasks\"\u003etask\u003c/a\u003e\nthe cookies array on the task will be overwritten\nwith fresh cookies from the crawler whenever it successfully finishes.\n\n\u003cb\u003eSECURITY NOTE:\u003c/b\u003e You should never share cookies or an exported crawler configuration containing cookies\nwith untrusted parties, because they might use it to authenticate themselves to various websites with your credentials.\n\nExample of **Initial cookies** setting:\n\n```json\n[\n  {\n    \"domain\": \".example.com\",\n    \"expires\": \"Thu, 01 Jun 2017 16:14:38 GMT\",\n    \"expiry\": 1496333678,\n    \"httponly\": true,\n    \"name\": \"NAME\",\n    \"path\": \"/\",\n    \"secure\": false,\n    \"value\": \"Some value\"\n  },\n  {\n    \"domain\": \".example.com\",\n    \"expires\": \"Thu, 01 Jun 2017 16:14:37 GMT\",\n    \"expiry\": 1496333677,\n    \"httponly\": true,\n    \"name\": \"OTHER_NAME\",\n    \"path\": \"/\",\n    \"secure\": false,\n    \"value\": \"Some other value\"\n  }\n]\n```\n\nThe **Cookies persistence** (`cookiesPersistence`) option\nindicates how the crawler saves and reuses cookies.\nWhen you start the crawler, the first PhantomJS process will\nuse the cookies defined by the \u003ca href=\"#cookies\"\u003ecookies\u003c/a\u003e setting.\nSubsequent PhantomJS processes will use cookies as follows:\n\u003c/p\u003e\n\u003ctable class=\"table table-bordered\"\u003e\n    \u003ctbody\u003e\n        \u003ctr\u003e\n            \u003ctd style=\"width: 30%\"\u003e\u003cb\u003ePer single crawling process only\u003c/b\u003e\u003cbr\u003e\u003ccode\u003e\"PER_PROCESS\"\u003c/code\u003e\u003c/td\u003e\n            \u003ctd style=\"width: 70%\"\u003e\n                Cookies are only maintained separately by each PhantomJS crawling process\n                for the lifetime of that process. The cookies are not shared between crawling processes.\n                This means that whenever the crawler rotates its IP address, it will start\n                again with cookies defined by the \u003ca href=\"#cookies\"\u003ecookies\u003c/a\u003e setting.\n                Use this setting for maximum privacy and to avoid detection of the crawler.\n                This is the \u003cb\u003edefault\u003c/b\u003e option.\n            \u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003e\u003cb\u003ePer full crawler run\u003c/b\u003e\u003cbr\u003e\u003ccode\u003e\"PER_CRAWLER_RUN\"\u003c/code\u003e\u003c/td\u003e\n            \u003ctd\u003e\n                Indicates that cookies collected at the start of the crawl by the first PhantomJS process\n                are reused by other PhantomJS processes, even when switching to a new IP address.\n                This might be necessary to maintain a login performed at the beginning of your crawl,\n                but it might help the server to detect the crawler.\n                Note that cookies are only collected at the beginning of the crawl by the initial\n                PhantomJS process. Cookies set by subsequent PhantomJS processes are only valid for the duration of that\n                process and are not reused by other processes. This is necessary to enable crawl parallelization.\n            \u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003e\u003cb\u003eOver all crawler runs\u003cbr\u003e\u003ccode\u003e\"OVER_CRAWLER_RUNS\"\u003c/code\u003e\u003c/b\u003e\u003c/td\u003e\n            \u003ctd\u003e\n                This setting is similar to \u003cb\u003ePer full crawler run\u003c/b\u003e,\n                the only difference is that if the actor finishes with \u003ccode\u003eSUCCEEDED\u003c/code\u003e status,\n                its current cookies are automatically saved\n                to the \u003ca href=\"#cookies\"\u003ecookies\u003c/a\u003e setting of the actor task\n                used to start the actor,\n                so that new crawler run starts where the previous run left off.\n                This is useful to keep login cookies fresh and avoid their expiration.\n            \u003c/td\u003e\n        \u003c/tr\u003e\n    \u003c/tbody\u003e\n\u003c/table\u003e\n\n\n## Crawling results\n\nThe crawling results are stored in the default dataset associated with the actor run,\nfrom where you can export them to formats such as JSON, XML, CSV or Excel.\nFor each web page visited, the crawler pushes a single\n[Request object](#requestObject) with all the details about the page into the dataset.\n\nTo download the results, call the\n[Get dataset items](https://docs.apify.com/api/v2#/reference/datasets/item-collection)\nAPI endpoint:\n\n```\nhttps://api.apify.com/v2/datasets/[DATASET_ID]/items?format=json\n```\n\nwhere `[DATASET_ID]` is the ID of actor's run dataset,\nwhich you can find the Run object returned when starting the actor.\nThe response looks as follows:\n\n```json\n[{\n  \"loadedUrl\": \"https://www.example.com/\",\n  \"requestedAt\": \"2019-04-02T21:27:33.674Z\",\n  \"type\": \"StartUrl\",\n  \"label\": \"START\",\n  \"pageFunctionResult\": [\n    {\n      \"product\": \"iPhone X\",\n      \"price\": 699\n    },\n    {\n      \"product\": \"Samsung Galaxy\",\n      \"price\": 499\n    }\n  ],\n  ...\n},\n...\n]\n```\n\nNote that in the full results of the legacy Crawler product results, each `Request`\nobject contained a field called `errorInfo`, even if it was empty.\nIn the dataset, this field is skipped if it's empty.\n\nTo download the data in simplified format known in the legacy Apify Crawler\nproduct, add the `simplified=1` query parameter:\n\n```\nhttps://api.apify.com/v2/datasets/[DATASET_ID]/items?format=json\u0026simplified=1\n```\n\nThe response will look like this:\n\n```javascript\n[{\n  \"url\": \"https://www.example.com/iphone-x\",\n  \"product\": \"iPhone X\",\n  \"price\": 699\n},\n{\n  \"url\": \"https://www.example.com/samsung-galaxy\",\n  \"product\": \"Samsung Galaxy\",\n  \"price\": 499\n},\n{\n  \"url\": \"https://www.example.com/nokia\",\n  \"errorInfo\": \"The page was not found\",\n}\n]\n```\n\nTo get the results in other formats, set `format` query parameter to `xml`, `xlsx`, `csv`, `html`, etc.\nFor full details, see the [Get dataset items](https://docs.apify.com/api/v2#/reference/datasets/item-collection)\nendpoint in API reference.\n\nTo skip the records containing the `errorInfo` field from the results,\nadd the `skipFailedPages=1` query parameter. This will ensure the results have a fixed structure, which is especially useful for tabular formats such as CSV or Excel.\n\n\n## Request object\n\nThe `Request` object contains all the available information about\nevery single web page the crawler encounters\n(both visited and not visited). This object comes into play\nin both \u003ca href=\"#page-function\"\u003ePage function\u003c/a\u003e\nand \u003ca href=\"#intercept-request-function\"\u003eIntercept request function\u003c/a\u003e,\nand crawling results are actually just an array of these objects.\n\nThe object has the following structure:\n\n```javascript\n{\n  // A string with a unique identifier of the Request object.\n  // It is generated from the uniqueKey, therefore two pages from various crawls\n  // with the same uniqueKey will also have the same ID.\n  id: String,\n\n  // The URL that was specified in the web page's navigation request,\n  // possibly updated by the 'interceptRequest' function\n  url: String,\n\n  // The final URL reported by the browser after the page was opened\n  // (will be different from 'url' if there was a redirect)\n  loadedUrl: String,\n\n  // Date and time of the original web page's navigation request\n  requestedAt: Date,\n  // Date and time when the page load was initiated in the web browser, or null if it wasn't\n  loadingStartedAt: Date,\n  // Date and time when the page was actually loaded, or null if it wasn't\n  loadingFinishedAt: Date,\n\n  // HTTP status and headers of the loaded page.\n  // If there were any redirects, the status and headers correspond to the final response, not the intermediate responses.\n  responseStatus: Number,\n  responseHeaders: Object,\n\n  // If the page could not be loaded for any reason (e.g. a timeout), this field contains a best guess of\n  // the code of the error. The value is either one of the codes from QNetworkReply::NetworkError codes\n  // or value 999 for an unknown error. This field is used internally to retry failed page loads.\n  // Note that the field is only informative and might not be set for all types of errors,\n  // always use errorInfo to determine whether the page was processed successfully.\n  loadErrorCode: Number,\n\n  // Date and time when the page function started and finished\n  pageFunctionStartedAt: Date,\n  pageFunctionFinishedAt: Date,\n\n  // An arbitrary string that uniquely identifies the web page in the crawling queue.\n  // It is used by the crawler to determine whether a page has already been visited.\n  // If two or more pages have the same uniqueKey, then the crawler only visits the first one.\n  //\n  // By default, uniqueKey is generated from the 'url' property as follows:\n  //  * hostname and protocol is converted to lower-case\n  //  * trailing slash is removed\n  //  * common tracking parameters starting with 'utm_' are removed\n  //  * query parameters are sorted alphabetically\n  //  * whitespaces around all components of the URL are trimmed\n  //  * if the 'considerUrlFragment' setting is disabled, the URL fragment is removed completely\n  //\n  // If you prefer different generation of uniqueKey, you can override it in the 'interceptRequest'\n  // or 'context.enqueuePage' functions.\n  uniqueKey: String,\n\n  // Describes the type of the request. It can be either one of the following values:\n  // 'InitialAboutBlank', 'StartUrl', 'SingleUrl', 'ActorRequest', 'OnUrlChanged', 'UserEnqueued', 'FoundLink'\n  // or in case the request originates from PhantomJS' onNavigationRequested() it can be one of the following values:\n  // 'Undefined', 'LinkClicked', 'FormSubmitted', 'BackOrForward', 'Reload', 'FormResubmitted', 'Other'\n  type: String,\n\n  // Boolean value indicating whether the page was opened in a main frame or a child frame\n  isMainFrame: Boolean,\n\n  // HTTP POST payload\n  postData: String,\n\n  // Content-Type HTTP header of the POST request\n  contentType: String,\n\n  // Contains \"GET\" or \"POST\"\n  method: String,\n\n  // Indicates whether the page will be loaded by the crawler or not\n  willLoad: Boolean,\n\n  // Indicates the label specified in startUrls or crawlPurls config settings where URL/PURL corresponds\n  // to this page request. If more URLs/PURLs are matching, this field contains the FIRST NON-EMPTY\n  // label in order in which the labels appear in startUrls and crawlPurls arrays.\n  // Note that labels are not mandatory, so the field might be null.\n  label: String,\n\n  // ID of the Request object from whose page this Request was first initiated, or null.\n  referrerId: String,\n\n  // Contains the Request object corresponding to 'referrerId'.\n  // This value is only available in pageFunction and interceptRequest functions\n  // and can be used to access properties and page function results of the page linking to the current page.\n  // Note that the referrer Request object DOES NOT recursively define the 'referrer' property.\n  referrer: Object,\n\n  // How many links away from start URLs was this page found\n  depth: Number,\n\n  // If any error occurred while loading or processing the web page,\n  // this field contains a non-empty string with a description of the error.\n  // The field is used for all kinds of errors, such as page load errors, the page function or\n  // intercept request function exceptions, timeouts, internal crawler errors etc.\n  // If there is no error, the field is a false-ish value (empty string, null or undefined).\n  errorInfo: String,\n\n  // Results of the user-provided 'pageFunction'\n  pageFunctionResult: Anything,\n\n  // A field that might be used by 'interceptRequest' function to save custom data related to this page request\n  interceptRequestData: Anything,\n\n  // Total size of all resources downloaded during this request\n  downloadedBytes: Number,\n\n  // Indicates the position where the request will be placed in the crawling queue.\n  // Can either be 'LAST' to put the request to the end of the queue (default behavior)\n  // or 'FIRST' to put it before any other requests.\n  queuePosition: String,\n\n  // Custom proxy used by the crawler, or null if custom proxies were not used.\n  // For security reasons, the username and password are redacted from the URL.\n  proxy: String\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapify%2Factor-legacy-phantomjs-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapify%2Factor-legacy-phantomjs-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapify%2Factor-legacy-phantomjs-crawler/lists"}