{"id":23752463,"url":"https://github.com/livelace/webchela","last_synced_at":"2025-08-14T18:41:59.644Z","repository":{"id":47954660,"uuid":"307841711","full_name":"livelace/webchela","owner":"livelace","description":"webchela (\"web\" + \"chela\") is a daemon for interacting with web pages through automated browsers (Chrome or Firefox).","archived":false,"fork":false,"pushed_at":"2024-07-01T21:01:07.000Z","size":7859,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-12-31T17:51:49.379Z","etag":null,"topics":["chrome","daemon","firefox","grpc","selenium","tigervnc","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/livelace.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-27T22:19:24.000Z","updated_at":"2024-07-01T21:01:11.000Z","dependencies_parsed_at":"2024-06-25T01:32:58.494Z","dependency_job_id":"af991a30-8e32-4e98-9c29-bc13dfd0bf69","html_url":"https://github.com/livelace/webchela","commit_stats":null,"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livelace%2Fwebchela","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livelace%2Fwebchela/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livelace%2Fwebchela/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livelace%2Fwebchela/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/livelace","download_url":"https://codeload.github.com/livelace/webchela/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239921844,"owners_count":19718844,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chrome","daemon","firefox","grpc","selenium","tigervnc","web-scraping"],"created_at":"2024-12-31T17:52:14.550Z","updated_at":"2025-02-20T22:17:25.217Z","avatar_url":"https://github.com/livelace.png","language":"Python","readme":"# webchela\n\n\n***webchela*** (\"web\" + \"chela\") is a daemon for interacting with web pages through automated browsers (Chrome or Firefox). \n\n### Main goal:\n\nProvide a plugin endpoint to other tool - [gosquito](https://github.com/livelace/gosquito). \n\n### Features:\n\n* Accepts tasks from clients over single [GRPC](https://grpc.io/) connection (control/data links). \n* Combines tasks into batches, control how many browser instances/tabs should run in parallel.\n* Splits fetched data into sized chunks (avoid transport limits).\n* Fully controlled on client side (browser type/arguments/extensions etc.). \n* Exposes server load to clients (client may skip busy one and switch to an idle server).\n* Can utilize GPU with help of [VirtualGL](https://www.virtualgl.org/)/[TurboVNC](https://www.turbovnc.org/) for graphics offloading.\n* Works in fully graphical mode (not native [headless mode](https://developer.chrome.com/docs/chromium/new-headless)), \nexposes 590x/VNC ports per browser instance for visual inspection.\n* Resizes browser window dynamically. \n* Makes full page screenshots and/or specific page elements. \n* Passing cookies to pages in JSON.\n\n\n### Quick start:\n\n**Start daemon (software rendering)**\n```shell script\nuser@localhost / $ docker run -p 50051:50051 -ti --rm -v /dev/shm:/dev/shm ghcr.io/livelace/webchela:v1.8.2\n2024-06-17 21:17:19,438 INFO supervisord started with pid 1\n2024-06-17 21:17:20,440 INFO spawned: 'xorg' with pid 2\n2024-06-17 21:17:20,442 INFO spawned: 'webchela' with pid 3\n2024-06-17 21:17:20,470 WARN exited: xorg (exit status 1; not expected)\n2024-06-17 21:17:20 webchela.config INFO Config sample was written successfully: /home/user/.webchela.toml\n2024-06-17 21:17:20 webchela.server INFO webchela v1.8.2\n\n```\n\n**Start daemon (GPU rendering)**\n```shell script\nuser@localhost / $ docker run --privileged -p 50051:50051 -ti --rm -v /dev/shm:/dev/shm ghcr.io/livelace/webchela:v1.8.2\n2024-06-17 21:19:55,179 INFO supervisord started with pid 1\n2024-06-17 21:19:56,180 INFO spawned: 'xorg' with pid 7\n2024-06-17 21:19:56,182 INFO spawned: 'webchela' with pid 8\n2024-06-17 21:19:56 webchela.server INFO webchela v1.8.2\n2024-06-17 21:19:57,782 INFO success: xorg entered RUNNING state, process has stayed up for \u003e than 1 seconds (startsecs)\n2024-06-17 21:19:57,782 INFO success: webchela entered RUNNING state, process has stayed up for \u003e than 1 seconds (startsecs)\n```\n\n**Use gosquito [configuration example](https://github.com/livelace/gosquito/blob/master/docs/plugins/process/webchela.md)**\n\n### Configuration example:\n\n```toml\n\n[default]\n\n#browser_type               = \"chrome\"\n#browser_extension          = []                                    # crx files included into webchela package\n\n#browser_type               = \"firefox\"\n#browser_extension          = []                                    # xpi files included into webchela package\n\n#browser_geometry           = \"1920x1080\"\n#browser_geometry           = \"dynamic\"                             # window will be resized to page content\n#browser_instance           = 1                                     # amount of instances will be launched in parallel\n#browser_instance_tab       = 10\n\n#browser_proxy              = \"http://1.2.3.4:3128\"\n#browser_proxy              = \"socks5://user:pass@1.2.3.4:1080\"\n\n#browser_retry_codes        = []\n#browser_retry_codes_tries  = 1\n\n#chrome_driver_path         = \"/usr/bin/chromedriver\"\n#chrome_extensions_dir      = \"\u003cINSTALL_PATH\u003e/extensions/chrome\"\n#chrome_path                = \"/usr/bin/google-chrome-stable\"\n#chrome_profile             = \"\"                                    # only one browser instance at time if set\n#chrome_profiles_dir        = \"/tmp/webchela/chrome\"\n\n#firefox_driver_path        = \"/usr/logcal/bin/geckodriver\"\n#firefox_extensions_dir     = \"\u003cINSTALL_PATH\u003e/extensions/firefox\"\n#firefox_path               = \"/usr/bin/firefox\"\n#firefox_profile            = \"\"                                    # only one browser instance at time if set\n#firefox_profiles_dir       = \"/tmp/webchela/firefox\"\n\n#chunk_size                 = \"3M\"\n#cpu_load                   = 30                                    # browser is a heavy thing, be careful with limits\n#keep_temp                  = false\n#log_level                  = \"DEBUG\"\n#mem_free                   = \"1G\"                                  # browser is a heavy thing, be careful with limits\n#page_size                  = \"10M\"\n#page_timeout               = 60\n#screenshot_timeout         = 30\n#script_timeout             = 30\n#task_timeout               = 600\n\n[server]\n\n#listen                     = \"0.0.0.0:50051\"\n#workers                    = 10                                    # set a lower value if you experiencing issues\n\n```\n\n### Screenshot example:\n\n![main](assets/worldclock.png)\n\n### Known issues:\n\n**Making full page screenshots in Chrome may finish with error: \n\"Message: disconnected: Unable to receive message from renderer\"**\n\nChromedriver has hardcoded timeout value for making page screenshots - 10 seconds, \nit could be not enough for really big screenshots. Use Firefox or wait for [fix](https://bugs.chromium.org/p/chromedriver/issues/detail?id=3916\u0026q=screenshot%20timeout\u0026can=2).","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivelace%2Fwebchela","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flivelace%2Fwebchela","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivelace%2Fwebchela/lists"}