{"id":16842332,"url":"https://github.com/dgtlmoon/sockpuppetbrowser","last_synced_at":"2025-03-17T04:33:47.456Z","repository":{"id":221145756,"uuid":"753571760","full_name":"dgtlmoon/sockpuppetbrowser","owner":"dgtlmoon","description":"A scalable server for providing Chrome interfaces where needed","archived":false,"fork":false,"pushed_at":"2025-02-28T13:36:25.000Z","size":416,"stargazers_count":50,"open_issues_count":11,"forks_count":10,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-16T08:23:37.902Z","etag":null,"topics":["chrome","chrome-cdp","nodejs","puppeteer","python3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dgtlmoon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-06T11:43:27.000Z","updated_at":"2025-03-13T19:11:55.000Z","dependencies_parsed_at":"2024-02-29T15:27:29.257Z","dependency_job_id":"53e4fb3a-8692-436e-82d9-b9e0d2e1aec9","html_url":"https://github.com/dgtlmoon/sockpuppetbrowser","commit_stats":null,"previous_names":["dgtlmoon/sockpuppetbrowser"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgtlmoon%2Fsockpuppetbrowser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgtlmoon%2Fsockpuppetbrowser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgtlmoon%2Fsockpuppetbrowser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgtlmoon%2Fsockpuppetbrowser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dgtlmoon","download_url":"https://codeload.github.com/dgtlmoon/sockpuppetbrowser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243975271,"owners_count":20377549,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chrome","chrome-cdp","nodejs","puppeteer","python3"],"created_at":"2024-10-13T12:45:26.920Z","updated_at":"2025-03-17T04:33:47.184Z","avatar_url":"https://github.com/dgtlmoon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Sock Puppet(eer) Browser](docs/sock-puppet-header.png?raw=true \"Sock Puppet(eer) Browser Logo Image\")\n# Sock Puppet(eer) Browser.\n\n## What is this?\n\nThis is a high-performance proxy for Chrome so that you can drive many simultaneous Chrome browsers easily and efficiently.\n\nWhen you connect on `ws://127.0.0.1:3000` as your \"CDP Chrome Browser URL\" URL it will always spin up a new fresh Chrome instance.\n\n\nThis project is the OpenSource'ed browser back-end for the amazing [opensource web page change detection](https://changedetection.io/) project.\n\nWhen a request for a new Chrome CDP starts, this software will launch an individual isolated-ish Chrome process\nfor just that request (This is a Chrome CDP \"Proxy\")\n\nIt is based on the excellent https://github.com/Zenika/alpine-chrome, and we add our own wrapper to launch\nindividual chrome instances on demand.\n\nWhen ever something requiring puppeteer connects via `ws://..` it will spin up a new Chrome browser\ninstance and connect you through (proxy you through) to that Chrome's DevTools connection.\n\nIt also handles throttling, scaling, and accepting extra Chrome settings on the connection query.\n\nUnder-the-hood it is a simple Python websockets wrapper using a [puppeteer](https://pptr.dev/) image, so \nthat we can be sure that all the basic configuration required for Chrome to work will function well.\n\n## Why do I need this?\n\nThis provides a Chrome interface to applications that need it, usually for example as required \nwhen using Playwright - Playwright will launch a `node` instance and start issuing `CDP` (Chrome protocol)\ncommands to drive the actual project. So you need this project.\n\n(Playwright gives a high-level command set, which talks to `node`, that `node` then does the low-level CDP\ncommands to drive Chrome directly)\n\nIt is also more efficient to not need that extra `node` process like with some other systems \n(you would end up with two node processes).\n\n`playwright -\u003e node -\u003e [sockpuppetserver] -\u003e CDP protocol todo the browser business`\n\nBecause this method is always built ontop of the latest puppeteer release, it's a lot more secure and reliable\nthan relying on projects to invidually update their Chrome browsers and configurations.\n\nYou can skip the whole `python` -\u003e `node` mess by using https://github.com/pyppeteer/pyppeteer and talk to this \ncontainer directly.\n\n\n## How to run\n\n```bash\nwget https://raw.githubusercontent.com/jfrazelle/dotfiles/master/etc/docker/seccomp/chrome.json\ndocker run --rm --security-opt seccomp=$(pwd)/chrome.json -p 127.0.0.1:3000:3000 dgtlmoon/sockpuppetbrowser\n```\n\n`seccomp` security setting is _highly_ recommended https://github.com/Zenika/alpine-chrome?tab=readme-ov-file#-the-best-with-seccomp\n\n### Statistics\n\nAccess `http://127.0.0.1:8080/stats` or which ever hostname you bind to, use `--sport` to specify something other than `8080`\n\n```\n{\n  \"active_connections\": 158,\n  \"connection_count_total\": 8383,\n  \"mem_use_percent\": 46.9,\n  \"special_counter_len\": 0\n}\n```\n\nYou can also add this to your fetch and access `'special_counter_len'` at the `/stats` URL, this is good for adding at the end of your scripts so you know the actual script ran all steps.\n\n```\n        try:\n            await self.page._client.send(\"SOCKPUPPET.specialcounter\")\n        except:\n            pass\n\n```\n\n### Debug CDP session logs\n\nSometimes you need to examine the low-level Chrome CDP protocol interaction, enable `ALLOW_CDP_LOG=yes` environment \nvariable and add `\u0026log-cdp=/path/somefile.txt` to the connection URL.\n\nThen the log will contain the CDP session, for example:\n\n```\n1712224824.5491815 - Attempting connection to ws://localhost:56745/devtools/browser/899f78ce-e7c8-4ad1-b8c9-a7aa449a93ef\n1712224824.5528538 - Connected to ws://localhost:56745/devtools/browser/899f78ce-e7c8-4ad1-b8c9-a7aa449a93ef\n1712224824.5529754 - Puppeteer -\u003e Chrome: {\"method\": \"Target.getBrowserContexts\", \"params\": {}, \"id\": 1}\n1712224824.553542 - Chrome -\u003e Puppeteer: {\"id\":1,\"result\":{\"browserContextIds\":[]}}\n...\n```\n\n### Tuning\n\nSome tips on high-concurrency scraping and tuning where you have a lot of chrome browsers running simultaneously\n\n- Understand different Chrome command line options https://github.com/GoogleChrome/chrome-launcher/blob/main/docs/chrome-flags-for-tools.md and specify them on the connection URL\n- Set your `inotify` values higher https://stackoverflow.com/questions/32281277/too-many-open-files-failed-to-initialize-inotify-the-user-limit-on-the-total\n- Don't burn out your disk!! Mount the path for `--user-data-dir` as a RAM Disk/tmpfs disk ! This will also help to speed up Chrome\n\nOn a `Intel(R) Xeon(R) E-2288G CPU @ 3.70GHz` (16 core), it will sustain 150 concurrent browser sessions with a load average of about 65-70 (about 3-4 browsers per CPU core it means).\n\nMost of the CPU load seems to occur when starting a browser, maybe in the future 1 browser could processes multiple requests.\n\n### Docker healthcheck\n\nAdd this to your `docker-compose.yml`, it will check port 3000 answers and that the `/stats` endpoint on port 8080 responds\n\n```\n    healthcheck:\n      test: \"python3 /usr/src/app/docker-health-check.py --host http://localhost\"\n      interval: 30s\n      timeout: 5s\n      retries: 3\n      start_period: 10s\n```\n\nTo review deeper docker container information about the containers health\n```\ndocker inspect --format='{{json .State.Health}}' browser-sockpuppetbrowser-1\n```\n\n### Future ideas\n\n- Some super cool \"on the wire\" hacks to add custom functionality to CDP, like issuing single commands to download files (PDF) to location https://github.com/dgtlmoon/changedetection.io/issues/2019\n\n\nHave fun!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdgtlmoon%2Fsockpuppetbrowser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdgtlmoon%2Fsockpuppetbrowser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdgtlmoon%2Fsockpuppetbrowser/lists"}