Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/livelace/webchela
webchela ("web" + "chela") is a daemon for interacting with web pages through automated browsers (Chrome or Firefox).
https://github.com/livelace/webchela
chrome daemon firefox grpc selenium tigervnc web-scraping
Last synced: 5 days ago
JSON representation
webchela ("web" + "chela") is a daemon for interacting with web pages through automated browsers (Chrome or Firefox).
- Host: GitHub
- URL: https://github.com/livelace/webchela
- Owner: livelace
- License: mit
- Created: 2020-10-27T22:19:24.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2024-06-25T00:41:50.000Z (6 months ago)
- Last Synced: 2024-06-25T01:33:08.452Z (6 months ago)
- Topics: chrome, daemon, firefox, grpc, selenium, tigervnc, web-scraping
- Language: Python
- Homepage:
- Size: 7.58 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# webchela
***webchela*** ("web" + "chela") is a daemon for interacting with web pages through automated browsers (Chrome or Firefox).
### Main goal:
Provide a plugin endpoint to other tool - [gosquito](https://github.com/livelace/gosquito).
### Features:
* Accepts tasks from clients over single [GRPC](https://grpc.io/) connection (control/data links).
* Combines tasks into batches, control how many browser instances/tabs should run in parallel.
* Splits fetched data into sized chunks (avoid transport limits).
* Fully controlled on client side (browser type/arguments/extensions etc.).
* Exposes server load to clients (client may skip busy one and switch to an idle server).
* Can utilize GPU with help of [VirtualGL](https://www.virtualgl.org/)/[TurboVNC](https://www.turbovnc.org/) for graphics offloading.
* Works in fully graphical mode (not native [headless mode](https://developer.chrome.com/docs/chromium/new-headless)),
exposes 590x/VNC ports per browser instance for visual inspection.
* Resizes browser window dynamically.
* Makes full page screenshots and/or specific page elements.
* Passing cookies to pages in JSON.### Quick start:
**Start daemon (software rendering)**
```shell script
user@localhost / $ docker run -p 50051:50051 -ti --rm -v /dev/shm:/dev/shm ghcr.io/livelace/webchela:v1.8.2
2024-06-17 21:17:19,438 INFO supervisord started with pid 1
2024-06-17 21:17:20,440 INFO spawned: 'xorg' with pid 2
2024-06-17 21:17:20,442 INFO spawned: 'webchela' with pid 3
2024-06-17 21:17:20,470 WARN exited: xorg (exit status 1; not expected)
2024-06-17 21:17:20 webchela.config INFO Config sample was written successfully: /home/user/.webchela.toml
2024-06-17 21:17:20 webchela.server INFO webchela v1.8.2```
**Start daemon (GPU rendering)**
```shell script
user@localhost / $ docker run --privileged -p 50051:50051 -ti --rm -v /dev/shm:/dev/shm ghcr.io/livelace/webchela:v1.8.2
2024-06-17 21:19:55,179 INFO supervisord started with pid 1
2024-06-17 21:19:56,180 INFO spawned: 'xorg' with pid 7
2024-06-17 21:19:56,182 INFO spawned: 'webchela' with pid 8
2024-06-17 21:19:56 webchela.server INFO webchela v1.8.2
2024-06-17 21:19:57,782 INFO success: xorg entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-06-17 21:19:57,782 INFO success: webchela entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
```**Use gosquito [configuration example](https://github.com/livelace/gosquito/blob/master/docs/plugins/process/webchela.md)**
### Configuration example:
```toml
[default]
#browser_type = "chrome"
#browser_extension = [] # crx files included into webchela package#browser_type = "firefox"
#browser_extension = [] # xpi files included into webchela package#browser_geometry = "1920x1080"
#browser_geometry = "dynamic" # window will be resized to page content
#browser_instance = 1 # amount of instances will be launched in parallel
#browser_instance_tab = 10#browser_proxy = "http://1.2.3.4:3128"
#browser_proxy = "socks5://user:[email protected]:1080"#browser_retry_codes = []
#browser_retry_codes_tries = 1#chrome_driver_path = "/usr/bin/chromedriver"
#chrome_extensions_dir = "/extensions/chrome"
#chrome_path = "/usr/bin/google-chrome-stable"
#chrome_profile = "" # only one browser instance at time if set
#chrome_profiles_dir = "/tmp/webchela/chrome"#firefox_driver_path = "/usr/logcal/bin/geckodriver"
#firefox_extensions_dir = "/extensions/firefox"
#firefox_path = "/usr/bin/firefox"
#firefox_profile = "" # only one browser instance at time if set
#firefox_profiles_dir = "/tmp/webchela/firefox"#chunk_size = "3M"
#cpu_load = 30 # browser is a heavy thing, be careful with limits
#keep_temp = false
#log_level = "DEBUG"
#mem_free = "1G" # browser is a heavy thing, be careful with limits
#page_size = "10M"
#page_timeout = 60
#screenshot_timeout = 30
#script_timeout = 30
#task_timeout = 600[server]
#listen = "0.0.0.0:50051"
#workers = 10 # set a lower value if you experiencing issues```
### Screenshot example:
![main](assets/worldclock.png)
### Known issues:
**Making full page screenshots in Chrome may finish with error:
"Message: disconnected: Unable to receive message from renderer"**Chromedriver has hardcoded timeout value for making page screenshots - 10 seconds,
it could be not enough for really big screenshots. Use Firefox or wait for [fix](https://bugs.chromium.org/p/chromedriver/issues/detail?id=3916&q=screenshot%20timeout&can=2).