{"id":19665510,"url":"https://github.com/zytedata/zyte-smartproxy-headless-proxy","last_synced_at":"2025-04-28T22:31:11.970Z","repository":{"id":37929981,"uuid":"158817714","full_name":"zytedata/zyte-smartproxy-headless-proxy","owner":"zytedata","description":"A complimentary proxy to help to use SPM with headless browsers","archived":false,"fork":false,"pushed_at":"2023-05-29T07:38:33.000Z","size":386,"stargazers_count":108,"open_issues_count":46,"forks_count":37,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-04-05T11:34:10.266Z","etag":null,"topics":["crawler","proxy","scraping"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zytedata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-23T10:32:18.000Z","updated_at":"2025-02-08T12:16:31.000Z","dependencies_parsed_at":"2024-11-11T16:34:45.118Z","dependency_job_id":null,"html_url":"https://github.com/zytedata/zyte-smartproxy-headless-proxy","commit_stats":null,"previous_names":["scrapinghub/crawlera-headless-proxy"],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zytedata%2Fzyte-smartproxy-headless-proxy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zytedata%2Fzyte-smartproxy-headless-proxy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zytedata%2Fzyte-smartproxy-headless-proxy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zytedata%2Fzyte-smartproxy-headless-proxy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zytedata","download_url":"https://codeload.github.com/zytedata/zyte-smartproxy-headless-proxy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251397577,"owners_count":21583034,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","proxy","scraping"],"created_at":"2024-11-11T16:23:09.390Z","updated_at":"2025-04-28T22:31:11.261Z","avatar_url":"https://github.com/zytedata.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Crawlera Headless Proxy\n\n[![Build Status](https://travis-ci.org/scrapinghub/crawlera-headless-proxy.svg?branch=master)](https://travis-ci.org/scrapinghub/crawlera-headless-proxy)\n[![Go Report Card](https://goreportcard.com/badge/github.com/scrapinghub/crawlera-headless-proxy)](https://goreportcard.com/report/github.com/scrapinghub/crawlera-headless-proxy)\n\nCrawlera Headless proxy is a proxy which main intent\nis to help users with headless browsers to use\n[Crawlera](https://scrapinghub.com/crawlera). This\nincludes different implementations of headless browsers\nsuch as [Splash](https://scrapinghub.com/splash),\nheadless [Chrome](https://google.com/chrome/), and\n[Firefox](https://www.mozilla.org/en-US/firefox/).\nAlso, this proxy should help users of such frameworks\nas [Selenium](https://www.seleniumhq.org/) and\n[Puppeteer](https://github.com/GoogleChrome/puppeteer) to use Crawlera\nwithout a need to build [Squid](http://www.squid-cache.org/) chains or\ninstall [Polipo](https://www.irif.fr/~jch/software/polipo/).\n\nThe biggest problem with headless browsers is their configuration:\n\n1. Crawlera uses proxy authentication protocol described in\n   [RFC 7235](https://tools.ietf.org/html/rfc7235#section-4.3) but it is\n   rather hard to configure such authentication in headless browsers. The\n   most popular way of bypassing this problem is to use Polipo which is,\n   unfortunately, unsupported for a long time.\n2. Crawlera uses\n   [X-Headers as configuration](https://doc.scrapinghub.com/crawlera.html#request-headers).\n   To use this API with headless browsers, users have to install plugins or\n   extensions in their browsers and configure them to propagate such headers\n   to Crawlera.\n3. Also, it is rather hard and complex to maintain best practices of using\n   these headers. For example,\n   [support of Browser Profiles](https://doc.scrapinghub.com/crawlera.html#x-crawlera-profile)\n   requires to have a minimal possible set of headers. For example, it is\n   recommended to remove `Accept` header by default. It is rather hard\n   to do that using headless browsers API.\n4. Crawlera works best with browsers only with some preconditions which\n   users have to repeat everytime: session usage, some recommended headers\n   like `Referer` etc.\n\nCrawlera Headless Proxy intended to help users to avoid such\nproblems. You should generally think about it as a proxy which should\nbe accessible by your headless browser of Selenium grid. This proxy\npropagates your requests to Crawlera maintaining API key and injecting\nheaders into the requests. Basically, you have to do a bare minimum:\n\n1. Get Crawlera API key\n2. Run this proxy on your local machine or any machine accessible by\n   headless browser, configuring it with a configuration file, command line\n   parameters or environment variables.\n3. Propagate TLS certificate of this proxy to your browsers or\n   operating system vault.\n4. Access this proxy as a local proxy, plain, without any authentication.\n\n\n## Installation\n\nCheck out the [Using Headless Browsers with Zyte Smart Proxy Manager](https://docs.zyte.com/smart-proxy-manager/headless.html) manual.\n\n### Install binaries\n\nThere are some prebuilt binaries available on Release pages. Please download\nrequired one for your operating system and CPU architecture.\n\n### Install from sources\n\n#### Install prerequisites\n\nYou need to have a distributions of Go programming language, git, bash\nand make installed. We use Go \u003e= 1.11 so please be sure that you have\nfresh enough version.\n\nTo install them on Ubuntu/Debian, please execute the following command:\n\n```console\n$ sudo apt install -y bash make git golang-go\n```\n\nIf you have Ubuntu older than 18.10, please install go with snap package:\n\n```console\n$ sudo snap install --classic go\n```\n\nTo install them on OS X with [Homebrew](https://brew.sh/),\nplease execute the following command:\n\n```console\n$ brew install go make git\n```\n\n\n#### Install from HomeBrew\n\nIf you use [HomeBrew](https://brew.sh), you can use it to install headless\nproxy:\n\n```console\n$ curl -L https://raw.githubusercontent.com/zytedata/zyte-smartproxy-headless-proxy/master/crawlera-headless-proxy.rb \u003e crawlera-headless-proxy.rb \u0026\u0026 brew install --HEAD crawlera-headless-proxy.rb\n```\n\n\n#### Build binary\n\n```console\n$ git clone https://github.com/zytedata/zyte-smartproxy-headless-proxy.git\n$ cd zyte-smartproxy-headless-proxy\n```\n\nThe next step is to execute make\n\n```console\n$ make\n```\n\nThis will build binary `crawlera-headless-proxy`. If you are interested\nin compiling for other OS/CPU architecture, please cross-compile:\n\n```console\n$ make crosscompile\n```\n\nYou'll find a set of compiled biaries in `./ccbuilds` directory after\nthe process is finished.\n\n\n### Docker container\n\nTo download prebuilt container, please do the following:\n\n```console\n$ docker pull zytedata/zyte-smartproxy-headless-proxy\n```\n\nIf you want to build this image locally, please do it with make (also,\nbe sure that [docker is installed](https://docs.docker.com/install/)).\n\n```console\n$ make docker\n```\n\nThis will build an image with tag `crawlera-headless-proxy`. It can\nbe configured by environment variables or command flags. Default\nconfiguration file path within a container is `/config.toml`.\n\nIf you want to have a smaller image (but build time will grow a lot),\nyou can build it with `docker-slim` make target.\n\n```console\n$ make docker-slim\n```\n\n\n## Usage\n\n### Help output\n\n```console\n$ crawlera-headless-proxy --help\nusage: crawlera-headless-proxy [\u003cflags\u003e]\n\nLocal proxy for Crawlera to be used with headless browsers.\n\nFlags:\n      --help                 Show context-sensitive help (also try --help-long and --help-man).\n  -d, --debug                Run in debug mode.\n  -b, --bind-ip=BIND-IP      IP to bind to. Default is 127.0.0.1.\n  -m, --proxy-api-ip=PROXY-API-IP\n                             IP to bind proxy API to. Default is the bind-ip value.\n  -p, --bind-port=BIND-PORT  Port to bind to. Default is 3128.\n  -w, --proxy-api-port=PROXY-API-PORT\n                             Port to bind proxy api to. Default is 3130.\n  -c, --config=CONFIG        Path to configuration file.\n  -l, --tls-ca-certificate=TLS-CA-CERTIFICATE\n                             Path to TLS CA certificate file.\n  -r, --tls-private-key=TLS-PRIVATE-KEY\n                             Path to TLS private key.\n  -t, --no-auto-sessions     Disable automatic session management.\n  -n, --concurrent-connections=CONCURRENT-CONNECTIONS\n                             Number of concurrent connections.\n  -a, --api-key=API-KEY      API key to Crawlera.\n  -u, --crawlera-host=CRAWLERA-HOST\n                             Hostname of Crawlera. Default is proxy.crawlera.com.\n  -o, --crawlera-port=CRAWLERA-PORT\n                             Port of Crawlera. Default is 8010.\n  -v, --dont-verify-crawlera-cert\n                             Do not verify Crawlera own certificate.\n  -x, --xheader=XHEADER ...  Crawlera X-Headers.\n  -k, --adblock-list=ADBLOCK-LIST ...\n                             A list to requests to filter out (ADBlock compatible).\n  -z, --direct-access-hostpath-regexps=DIRECT-ACCESS-HOSTPATH-REGEXPS ...\n                             A list of regexps for hostpath for direct access, bypassing Crawlera.\n  -e, --direct-access-except-hostpath-regexps=DIRECT-ACCESS-EXCEPT-HOSTPATH-REGEXPS ...\n                             A list of regexps for hostpath for direct access. No effect without `-z`. Not required.\n      --version              Show application version.\n```\n\nDocker example:\n```console\n$ docker run --name crawlera-headless-proxy -p 3128:3128 zytedata/zyte-smartproxy-headless-proxy --help\n```\n\n### Configuration\n\nDefaults are sensible. If you run this tool without any configuration,\nit will start HTTP/HTTPS proxy on `localhost:3128`. The only thing you\nusually need to do is to propagate API key.\n\n```console\n$ crawlera-headless-proxy -a myapikey\n```\n\nThis will start local HTTP/HTTPS proxy on `localhost:3128` and will proxy all\nrequests to `proxy.crawlera.com:8010` with API key `myapikey`.\n\nAlso, it is possible to configure this tool using environment variables.\nHere is the complete table of configuration options and corresponding\nenvironment variables.\n\n| *Description*                                                                    | *Environment variable*                 | *Comandline parameter*                          | *Parameter in configuration file*       | *Default value*      |\n|----------------------------------------------------------------------------------|----------------------------------------|-------------------------------------------------|-----------------------------------------|----------------------|\n| Run in debug/verbose mode.                                                       | `CRAWLERA_HEADLESS_DEBUG`              | `-d`, `--debug`                                 | `debug`                                 | `false`              |\n| Which IP this tool should listen on (0.0.0.0 for all interfaces).                | `CRAWLERA_HEADLESS_BINDIP`             | `-b`, `--bind-ip`                               | `bind_ip`                               | `127.0.0.1`          |\n| Which port this tool should listen.                                              | `CRAWLERA_HEADLESS_BINDPORT`           | `-p`, `--bind-port`                             | `bind_port`                             | 3128                 |\n| Path to the configuration file.                                                  | `CRAWLERA_HEADLESS_CONFIG`             | `-c`, `--config`                                | -                                       |                      |\n| API key of Crawlera.                                                             | `CRAWLERA_HEADLESS_APIKEY`             | `-a`, `--api-key`                               | `api_key`                               |                      |\n| Hostname of Crawlera.                                                            | `CRAWLERA_HEADLESS_CHOST`              | `-u`, `--crawlera-host`                         | `crawlera_host`                         | `proxy.crawlera.com` |\n| Port of Crawlera.                                                                | `CRAWLERA_HEADLESS_CPORT`              | `-o`, `--crawlera-port`                         | `crawlera_port`                         | 8010                 |\n| Do not verify Crawlera own TLS certificate.                                      | `CRAWLERA_HEADLESS_DONTVERIFY`         | `-v`, `--dont-verify-crawlera-cert`             | `dont_verify_crawlera_cert`             | `false`              |\n| Path to own TLS CA certificate.                                                  | `CRAWLERA_HEADLESS_TLSCACERTPATH`      | `-l`, `--tls-ca-certificate`                    | `tls_ca_certificate`                    | \u003cembeded\u003e            |\n| Path to own TLS private key.                                                     | `CRAWLERA_HEADLESS_TLSPRIVATEKEYPATH`  | `-r`, `--tls-private-key`                       | `tls_private_key`                       | \u003cembeded\u003e            |\n| Disable automatic session management                                             | `CRAWLERA_HEADLESS_NOAUTOSESSIONS`     | `-t`, `--no-auto-sessions`                      | `no_auto_sessions`                      | `false`              |\n| Maximal ammount of concurrent connections to process                             | `CRAWLERA_HEADLESS_CONCURRENCY`        | `-n`, `--concurrent-connections`                | `concurrent_connections`                | 0                    |\n| Additional Crawlera X-Headers.                                                   | `CRAWLERA_HEADLESS_XHEADERS`           | `-x`, `--xheaders`                              | Section `xheaders`                      |                      |\n| Adblock-compatible filter lists.                                                 | `CRAWLERA_HEADLESS_ADBLOCKLISTS`       | `-k`, `--adblock-list`                          | `adblock_lists`                         |                      |\n| Regular expressions for hostpath URL part for direct access, bypassing Crawlera. | `CRAWLERA_HEADLESS_DIRECTACCESS`       | `-z`, `--direct-access-hostpath-regexps`        | `direct_access_hostpath_regexps`        |                      |\n| Exceptions to DirectAccess. Always proxied irrespective of direct acces regex.   | `CRAWLERA_HEADLESS_DIRECTACCESS_EXCEPT`| `-e`, `--direct-access-except-hostpath-regexps` | `direct_access_except_hostpath_regexps` |                      |\n| Which IP should proxy API listen on (default is `bind-ip` value).                | `CRAWLERA_HEADLESS_PROXYAPIIP`         | `-m`, `--proxy-api-ip`                          | `proxy_api_ip`                          | \u003csame as `bind_ip`\u003e  |\n| Which port proxy API should listen on.                                           | `CRAWLERA_HEADLESS_PROXYAPIPORT`       | `-w`, `--proxy-api-port`                        | `proxy_api_port`                        | 3130                 |\n\n0 concurrent connections means unlimited. Embedded TLS key/certificate\nmeans that headless proxy will use ones from the repository.\n\nConfiguration is implemented in [TOML\nlanguage](https://github.com/toml-lang/toml). If you haven't heard about\nTOML, please consider it as a hardened INI configuration file. Every\nconfiguration goes to top-level section (unnamed). X-Headers go to its\nown section. Let's express following command line in the configuration\nfile:\n\n```console\n$ crawlera-headless-proxy -b 0.0.0.0 -p 3129 -u proxy.crawlera.com -o 8010 -x profile=desktop -x cookies=disable\n```\n\nConfiguration file will look like:\n\n```toml\nbind_ip = \"0.0.0.0\"\nbind_port = 3129\ncrawlera_host = \"proxy.crawlera.com\"\ncrawlera_port = 8010\n\n[xheaders]\nprofile = \"desktop\"\ncookies = \"disable\"\n```\n\nYou can use both command line flags, environment variables, and\nconfiguration files. This tool will resolve these options according to\nthis order (1 has max priority, 4 - minimal):\n\n1. Environment variables\n2. Commandline flags\n3. Configuration file\n4. Defaults\n\nDocker example:\n```console\n$ docker run --name crawlera-headless-proxy -p 3128:3128 zytedata/zyte-smartproxy-headless-proxy -a $APIKEY -d -x profile=pass -x cookies=disable -x no-bancheck=1 --direct-access-hostpath-regexps=\".*?\\.(?:txt|json|css|less|js|mjs|cjs|gif|ico|jpe?g|svg|png|webp|mkv|mp4|mpe?g|webm|eot|ttf|woff2?)$\" --adblock-list=\"https://easylist.to/easylist/easylist.txt\" --adblock-list=\"https://easylist.to/easylist/easyprivacy.txt\"\n```\n\n\n## Concurrency\n\nThere is a limiter on maximal amount of concurrent connections\n`--concurrent-connections`. This is required because default Crawlera\nlimits the number of concurrent connections based on the billing\nplan of the user. If the user exceeds this amount, Crawlera returns\na response with status code 429. This can be rather irritating so\nthere is internal limiter which is more friendly to the browsers. You\nneed to set up a number of concurrent connections for your plan and\ncrawlera-headless-proxy will throttle your requests before they will go\nto Crawlera. It won't send 429 back, it just holds excess requests.\n\n\n## Automatic session management\n\nCrawlera allows using sessions and sessions are natural if we are\ntalking about browsers. Session binds a certain IP to some session ID so\nall requests will go through the same IP, in the same way as ordinary\nwork with browser looks like. It can slow down your crawl but increase\nits quality for some websites.\n\nThe current implementation of automatic session management is done with\nthe assumption that only one browser is used to access this proxy. There\nis no clear and simple way how to distinguish the browsers accessing\nthis proxy concurrently.\n\nBasic behavior is here:\n\n1. If the session is not created, it would be created on the first request.\n2. Until session is known, all other requests are on hold\n3. After session id is known, other requests will start to use that session.\n4. If the session became broken, all requests are set on hold until the\n   new session will be created.\n5. All requests which were failed because of a broken session would\n   be retried with new. If a new session is not ready yet, they will\n   wait until this moment.\n\nSuch retries will be done only once because they might potentially block\nbrowser for a long time. All retries are also done with 30 seconds\ntimeout.\n\n\n## Adblock list support\n\ncrawlera-headless-proxy supports preventive filtering by;\nadblock-compatible filter lists like EasyList. If you start the tool\nwith such lists, they are going to be downloaded and requests to\ntrackers/advertising platforms will be filtered. This will save you a\nlot of throughput and requests passed to Crawlera.\n\nIf you do not pass any list, such filtering won't\nbe enabled. The list we recommend to use is\n[EasyList](https://easylist.to/easylist/easylist.txt)\n(please do not forget to add region-specific lists),\n[EasyPrivacy](https://easylist.to/easylist/easyprivacy.txt) and\n[Disconnect](https://s3.amazonaws.com/lists.disconnect.me/simple_malware.txt).\n\n\n## Direct access\n\nSometimes you want to save a capacity and execute requests bypassing\nCrawlera. These requests can include some static assets, text files or\nanything else where Crawlera is not necessary.\n\nYou can specify a list of regular expressions which matches host + path\nparts of URL for direct access from headless proxy, ignoring Crawlera.\n\n\n## TLS keys\n\nSince crawlera-headless-proxy has to inject X-Headers into responses,\nit works with your browser only by HTTP 1.1. Unfortunately, there is no\nclear way how to hijack HTTP2 connections. Also, since it is effectively\nMITM proxy, you need to use its own TLS certificate. This is hardcoded\ninto the binary so you have to download it and apply it to your system.\nPlease consult with manuals of your operating system how to do that.\n\nLink to certificate is\nIts SHA256 checksum is `100c7dd015814e7b8df16fc9e8689129682841d50f9a1b5a8a804a1eaf36322d`.\n\nIf you want to have your own certificate, please generate it. The\nsimplest way to do that is to execute the following command:\n\n```console\n$ openssl req -x509 -newkey rsa:4096 -keyout private-key.pem -out ca.crt -days 3650 -nodes\n```\n\nThis command will generate TLS private key `private-key.pem` and\nself-signed certificate `ca.crt`.\n\n\n## Proxy API\n\ncrawlera-headless-proxy has its own HTTP Rest API which is bind to\nanother port. Right now only one endpoint is supported.\n\n### `GET /stats`\n\nThis endpoint returns various statistics on the current work of proxy.\n\nExample:\n\n```json\n{\n  \"requests_number\": 423,\n  \"crawlera_requests\": 426,\n  \"crawlera_errors\": 0,\n  \"all_errors\": 6,\n  \"adblocked_requests\": 0,\n  \"sessions_created\": 4,\n  \"clients_connected\": 1,\n  \"clients_serving\": 1,\n  \"traffic\": 6326557,\n  \"overall_times\": {\n    \"average\": 0.37859728122037895,\n    \"minimal\": 0.016320158,\n    \"maxmimal\": 6.96558913,\n    \"median\": 0.1117137805,\n    \"standard_deviation\": 1.001460285777158,\n    \"percentiles\": {\n      \"10\": 0.05237131,\n      \"20\": 0.071472272,\n      \"30\": 0.088965026,\n      \"40\": 0.101607119,\n      \"50\": 0.1117137805,\n      \"60\": 0.125672599,\n      \"70\": 0.137716451,\n      \"75\": 0.146478028,\n      \"80\": 0.154273865,\n      \"85\": 0.162262952,\n      \"90\": 0.180582867,\n      \"95\": 3.514414853,\n      \"99\": 3.729193071\n    }\n  },\n  \"crawlera_times\": {\n    \"average\": 0.30196985308000035,\n    \"minimal\": 3.1394e-05,\n    \"maxmimal\": 3.750836014,\n    \"median\": 0.080731409,\n    \"standard_deviation\": 0.8410949224993787,\n    \"percentiles\": {\n      \"10\": 0.036049358,\n      \"20\": 0.049492537,\n      \"30\": 0.062905696,\n      \"40\": 0.072465399,\n      \"50\": 0.080731409,\n      \"60\": 0.0885455695,\n      \"70\": 0.09895164,\n      \"75\": 0.103160955,\n      \"80\": 0.110743335,\n      \"85\": 0.118884673,\n      \"90\": 0.129430856,\n      \"95\": 3.494953838,\n      \"99\": 3.694614379\n    }\n  },\n  \"traffic_times\": {\n    \"average\": 15099.18138424821,\n    \"minimal\": 336,\n    \"maxmimal\": 516239,\n    \"median\": 10383,\n    \"standard_deviation\": 31541.145341694657,\n    \"percentiles\": {\n      \"10\": 7511,\n      \"20\": 8441,\n      \"30\": 9230,\n      \"40\": 9833,\n      \"50\": 10383,\n      \"60\": 10889,\n      \"70\": 11398,\n      \"75\": 11853,\n      \"80\": 12327,\n      \"85\": 13121,\n      \"90\": 15153,\n      \"95\": 44791,\n      \"99\": 73846\n    }\n  },\n  \"uptime\": 123\n}\n```\n\nHere is the description of these stats:\n\n* `requests_number` - a number of requests managed by headless proxy.\n     This includes all possible requests, not only those which were\n     sent to Crawlera.\n* `crawlera_requests` - a number of requests which were sent to Crawlera.\n     This also includes retries on session restoration etc.\n* `sessions_created` - how many sessions were created by headless\n     proxy so far.\n* `clients_connected` - how many clients (requests) are connected to\n     the headless proxy at this moment.\n* `clients_serving` - how many clients (requests) are doing requests\n     to Crawlera now.\n* `traffic` - an amount of traffic sent to clients in bytes.\n     This metric does include headers and body sizes.\n* `crawlera_errors` - a number of responses where `X-Crawlera-Error`\n     header is set.\n* `all_errors` - a number of responses with errors (canceled,\n     timeouts and crawlera_errors).\n* `adblocked_requests` - a number of requests which were\n     blocked by Adblock lists.\n*_`times` describes different time series (overall response time,\n     time spent in crawlera) etc and provide average(mean), min and\n     max values, standard deviation and histogram of percentiles.\n     Time series are done in window mode, tracking only latest 3000 values.\n\nPlease pay attention that usually requests_number and crawlera_requests\nare different. This is because headless proxy filters adblock requests\nand also retries to recreate sessions which imply additional Crawlera\nrequests. So, depending on the netloc proportion of these numbers can\ndiffer.\n\nAlso, `clients_serving \u003c= clients_connected` because of rate limiting. You\nmay consider client_serving as requests which pass rate limiter.\n\n\n## Crawlera X-Headers\n\nCrawlera is configured using the special headers, which usually are\ncalled x-headers (they have `X-` prefix in their name). You can find a\n[full list](https://doc.scrapinghub.com/crawlera.html#request-headers)\nof them in documentation.\n\nThere are 2 different ways of providing these header to the headless proxy:\n\n1. Use the full name\n2. Use the short version\n\nFor example, user wants to use desktop browser profile. Corresponding\nheader is `X-Crawlera-Profile`. So, user can add following line to the\nconfiguration:\n\n```toml\n[xheaders]\nx-crawlera-profile = \"desktop\"\n```\n\nor pass it via command line:\n\n```console\n$ crawlera-headless-proxy ... -x x-crawlera-profile=desktop\n```\n\nbut it makes no sense to use `X-Crawlera-` prefix all the time, please omit it.\n\n```toml\n[xheaders]\nprofile = \"desktop\"\n```\n\nor pass it via command line:\n\n```console\n$ crawlera-headless-proxy ... -x profile=desktop\n```\n\n\n# Examples\n\n## curl\n\n```console\n$ crawlera-headless-proxy -p 3128 -a \"$MYAPIKEY\" -x profile=desktop\n$ curl -x localhost:3128 -sLI https://scrapinghub.com\n```\n\n## Selenium (Python)\n\n```python\nfrom selenium import webdriver\n\nCRAWLERA_HEADLESS_PROXY = \"localhost:3128\"\n\nprofile = webdriver.DesiredCapabilities.FIREFOX.copy()\nprofile[\"proxy\"] = {\n    \"httpProxy\": CRAWLERA_HEADLESS_PROXY,\n    \"ftpProxy\": CRAWLERA_HEADLESS_PROXY,\n    \"sslProxy\": CRAWLERA_HEADLESS_PROXY,\n    \"noProxy\": None,\n    \"proxyType\": \"MANUAL\",\n    \"class\": \"org.openqa.selenium.Proxy\",\n    \"autodetect\": False\n}\n\ndriver = webdriver.Remote(\"http://localhost:4444/wd/hub\", profile)\ndriver.get(\"https://scrapinghub.com\")\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzytedata%2Fzyte-smartproxy-headless-proxy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzytedata%2Fzyte-smartproxy-headless-proxy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzytedata%2Fzyte-smartproxy-headless-proxy/lists"}