{"id":25824984,"url":"https://github.com/spider-rs/headless-browser","last_synced_at":"2025-04-09T13:03:35.936Z","repository":{"id":45736002,"uuid":"514306494","full_name":"spider-rs/headless-browser","owner":"spider-rs","description":"Headless browser - reliable high performance setup for cloud providers like Fargate, CloudRun, and more.","archived":false,"fork":false,"pushed_at":"2025-03-01T01:09:52.000Z","size":299,"stargazers_count":36,"open_issues_count":0,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-01T03:04:02.059Z","etag":null,"topics":["cdp","chrome","chrome-api","chrome-server","devtools-protocol","headless"],"latest_commit_sha":null,"homepage":"https://spider.cloud/guides/scaling-headless-chrome","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spider-rs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-15T14:51:20.000Z","updated_at":"2025-03-20T06:45:16.000Z","dependencies_parsed_at":"2024-07-13T13:46:10.269Z","dependency_job_id":"9b4edaba-3b89-4510-bb0d-f16fbadaabbb","html_url":"https://github.com/spider-rs/headless-browser","commit_stats":null,"previous_names":["spider-rs/chrome-server","a11ywatch/chrome","spider-rs/chrome","spider-rs/headless-browser"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fheadless-browser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fheadless-browser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fheadless-browser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fheadless-browser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spider-rs","download_url":"https://codeload.github.com/spider-rs/headless-browser/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247657275,"owners_count":20974344,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdp","chrome","chrome-api","chrome-server","devtools-protocol","headless"],"created_at":"2025-02-28T13:26:20.364Z","updated_at":"2025-04-09T13:03:35.912Z","avatar_url":"https://github.com/spider-rs.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# headless-browser\n\nThe `headless-browser` crate offers a scalable solution for managing headless Chrome instances with integrated proxy and server support. This system is designed to optimize resource utilization and enhance performance for large-scale web automation tasks.\n\n## Installation\n\nMake sure you have [Rust](https://www.rust-lang.org/learn/get-started) installed before proceeding.\n\n```sh\ncargo install headless_browser\n```\n\n## Usage\n\n1. Launches the most recent browser instance, supporting remote proxy connections and a server for `json/version` rewrites to facilitate networking.\n2. Allows manual and automatic spawning and termination of multiple headless instances, including error handling functionalities.\n3. Provides access to Chrome DevTools Protocol (CDP) WebSocket connections and status, caching values for improved performance.\n\nBy default, the instance binds Chrome to `0.0.0.0` when initialized via the API.\n\nUse the `REMOTE_ADDRESS` environment variable to specify the desired address for the Chrome instance, whether local or networked.\n\nThe application passes load balancer health checks on port `6000`, providing the status of the Chrome container.\n\nTo run Chrome on a load balancer, a companion application is required, which is a primary function of the server.\n\nThe default port configuration includes `9223` for Chrome and `9222` for the TCP proxy, in response to `0.0.0.0` not being exposed in recent versions like `HeadlessChrome/131.0.6778.139` and newer.\n\nFor web scraping, we suggest using our [Headless Shell Dockerfile](./docker/Dockerfile.headless_shell_playwright) or downloading the [chrome-headless-shell](https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.23/linux64/chrome-headless-shell-linux64.zip). The headless-shell offers significantly faster performance compared to the standard headless mode. Additionally, our Docker image is remarkably compact for the chrome-headless-shell, thanks to a multi-stage build process that installs only the essential components, resulting in a size of just 300 MB compared to Playwright's traditional 1.2 GB default.\n\n---\n\n## API\n\n1. POST: `fork` to start a new chrome instance or use `fork/$port` with the port to startup the instance ex: `curl --location --request POST 'http://localhost:6000/fork/9223'`.\n2. POST: `shutdown/$PID` to shutdown the instance. ex: `curl --location --request POST 'http://localhost:6000/shutdown/77057'`\n3. POST: `/json/version` get the json info of the chrome instance to connect to web sockets ex: `curl --location --request POST 'http://localhost:6000/json/version'`.\n\n### Curl Examples\n\n`fork`\n\n```sh\ncurl --location --request POST 'http://localhost:6000/fork'\n```\n\n`shutdown`\n\n```sh\ncurl --location --request POST 'http://localhost:6000/shutdown'\n```\n\n`/json/version`\n\n```sh\ncurl --location --request GET 'http://localhost:6000/json/version' \\\n--header 'Content-Type: application/json'\n\n# example output\n{\n   \"Browser\": \"HeadlessChrome/131.0.6778.139\",\n   \"Protocol-Version\": \"1.3\",\n   \"User-Agent\": \"Mozilla/5.0 (X11; Linux aarch64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/131.0.6778.139 Safari/537.36\",\n   \"V8-Version\": \"13.1.201.16\",\n   \"WebKit-Version\": \"537.36 (@c35bbcbd7c2775a12a3f320e05ac0022939b1a8a)\",\n   \"webSocketDebuggerUrl\": \"ws://127.0.0.1:9222/devtools/browser/43e14f5a-6877-4e2f-846e-ab5801f1b6fc\"\n}\n```\n\n## Args\n\n1. The first arg is the chrome application location example linux `'/opt/google/chrome/chrome'`.\n2. The second arg is the chrome address `127.0.0.1`.\n3. The third arg you can pass in `init` to auto start chrome on `9222`.\n\nExample to start chrome (all params are optional):\n\n```sh\nheadless_browser '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome' 127.0.0.1 init\n# Chrome PID: 87659\n# Chrome server at localhost:6000\n# DevTools listening on ws://127.0.0.1:9222/devtools/browser/c789f9e0-7f65-495d-baee-243eb454ea15\n```\n\n### Docker\n\nYou can build this image using the following:\n\n1. Dockerfile (Alpine)\n1. Dockerfile.playwright (Playwright Custom Chrome)\n1. Dockerfile.headless_shell (Manual)\n1. Dockerfile.brave\n1. Dockerfile.headless_shell_playwright (Playwright Headless Shell)\n1. Dockerfile.xvfb (Virtual Display)\n1. Dockerfile.lightpanda\n\nYou need to set the env variable passed in as an arg `HOSTNAME_OVERRIDE` to override the docker container and set it to `host.docker.internal`.\n\n#### Manual (WIP)\n\nUse the following to build with docker.\nRun the command `./build.sh` to build chrome on the machine with docker.\nThe build scripts are originally from [docker-headless-shell](https://github.com/chromedp/docker-headless-shell).\n\n### ENV Variables\n\n```sh\n# the chrome path on the OS ex: CHROME_PATH=./chrome-headless-shell/mac_arm-132.0.6834.159/chrome-headless-shell-mac-arm64/chrome-headless-shell\nCHROME_PATH=\n# the remote address of the chrome intance\nREMOTE_ADDRESS=\n# use brave browser as default. Set the value to true.\nBRAVE_ENABLED=\n```\n\n## Library\n\nYou can use the [lib](https://docs.rs/headless_browser_lib/latest/headless_browser_lib/) with `cargo add headless_browser_lib` control the startup and shutdown manually. Below is an example of using the [spider_chrome](https://github.com/spider-rs/headless-browser) project to run CDP commands concurrently fast.\n\n```rust\nfutures-util = \"0.3\"\ntokio = { version = \"1\", features = [\"full\"] }\nspider_chrome = \"2\"\nheadless_browser_lib = \"0.1\"\n```\n\n```rust\n/// spider_chrome is mapped to chromiumoxide since it was forked and kept the API the same.\nuse chromiumoxide::{browser::Browser, error::CdpError};\nuse futures_util::stream::StreamExt;\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    tokio::spawn(headless_browser_lib::run_main()); // spawn main server, proxy, and headless.\n    tokio::time::sleep(std::time::Duration::from_millis(200)).await; // give a slight delay for now until we use a oneshot.\n    let start = std::time::Instant::now();\n\n    let (browser, mut handler) =\n         // port 6000 - main server entry for routing correctly across networks.\n        Browser::connect_with_config(\"http://127.0.0.1:6000/json/version\", Default::default())\n            .await?;\n\n    let handle = tokio::task::spawn(async move {\n        while let Some(h) = handler.next().await {\n            if h.is_err() {\n                break;\n            }\n        }\n    });\n\n    let navigate_page = async |u: \u0026str| -\u003e Result\u003cString, CdpError\u003e {\n        let page = browser.new_page(u).await?;\n        let html = page.wait_for_navigation().await?.content().await?;\n        Ok::\u003cString, CdpError\u003e(html)\n    };\n\n    let (spider_html, example_html) = tokio::join!(\n        navigate_page(\"https://spider.cloud\"),\n        navigate_page(\"https://example.com\")\n    );\n\n    browser.close().await?;\n    let _ = handle.await;\n\n    let spider_html = spider_html?;\n    let example_html = example_html?;\n\n    let elasped = start.elapsed();\n\n    browser.close().await?;\n    let _ = handle.await;\n\n    println!(\"===\\n{}\\n{}\\n===\", \"spider.cloud\", spider_html);\n    println!(\"===\\n{}\\n{}\\n===\", \"example.com\", example_html);\n    println!(\"Time took: {:?}\", elasped);\n\n    Ok(())\n}\n```\n\n## Testing and Benchmarks\n\nView the [benches](./benches/README.md) to see the performance between browsers for headless.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspider-rs%2Fheadless-browser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspider-rs%2Fheadless-browser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspider-rs%2Fheadless-browser/lists"}