Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/MontFerret/worker
Containerized Ferret worker
https://github.com/MontFerret/worker
chrome crawler docker dsl ferret go hacktoberfest hacktoberfest2020 scraping scraping-websites service worker
Last synced: about 2 months ago
JSON representation
Containerized Ferret worker
- Host: GitHub
- URL: https://github.com/MontFerret/worker
- Owner: MontFerret
- License: apache-2.0
- Created: 2020-05-08T12:22:00.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-03-29T01:59:10.000Z (over 1 year ago)
- Last Synced: 2024-08-01T13:29:05.603Z (5 months ago)
- Topics: chrome, crawler, docker, dsl, ferret, go, hacktoberfest, hacktoberfest2020, scraping, scraping-websites, service, worker
- Language: Go
- Homepage:
- Size: 1.68 MB
- Stars: 15
- Watchers: 4
- Forks: 7
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Worker
**Worker** is a simple HTTP server that accepts FQL queries, executes them and returns their results.
OpenAPI v2 schema can be found [here](https://raw.githubusercontent.com/MontFerret/cli/master/reference/ferret-worker.yaml).## Quick start
The Worker is shipped with dedicated Docker image that contains headless Google Chrome, so feel free to run queries using `cdp` driver:
DockerHub
```sh
docker run -d -p 8080:8080 montferret/worker
```
GitHub
```sh
docker run -d -p 8080:8080 ghcr.io/montferret/worker
```Alternatively, if you want to use your own version of Chrome, you can run the Worker locally.
By installing the binary:
```shell
curl https://raw.githubusercontent.com/MontFerret/worker/master/install.sh | sh
worker
```Or by building locally:
```sh
make
```And then just make a POST request:
![worker](https://raw.githubusercontent.com/MontFerret/worker/master/assets/postman.png)
## System Resource Requirements
- 2 CPU
- 2 Gb of RAM## Usage
### Endpoints
#### POST /
Executes a given query. The payload must have the following shape:```
Query {
text: String!
params: Map
}
```#### GET /info
Returns a worker information that contains details about Chrome, Ferret and itself. Has the following shape:```
Info {
ip: String!
version: Version! {
worker: String!
chrome: ChromeVersion! {
browser: String!
protocol: String!
v8: String!
webkit: String!
}
ferret: String!
}
}
```#### GET /health
Health check endpoint (for Kubernetes, e.g.). Returns empty 200.### Run commands
```bash
-log-level="debug"
log level
-port=8080
port to listen
-body-limit=1000
maximum size of request body in kb. 0 means no limit.
-request-limit=20
amount of requests per second for each IP. 0 means no limit.
-request-limit-time-window=180
amount of seconds for request rate limit time window.
-cache-size=100
amount of cached queries. 0 means no caching.
-chrome-ip="127.0.0.1"
Google Chrome remote IP address
-chrome-port=9222
Google Chrome remote debugging port
-no-chrome=false
disable Chrome driver
-version=false
show version
-help=false
show this list
```