{"id":20246363,"url":"https://github.com/montferret/worker","last_synced_at":"2026-05-24T01:00:58.723Z","repository":{"id":48404137,"uuid":"262316390","full_name":"MontFerret/worker","owner":"MontFerret","description":"Containerized Ferret worker","archived":false,"fork":false,"pushed_at":"2026-05-11T16:52:08.000Z","size":1821,"stargazers_count":15,"open_issues_count":14,"forks_count":7,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-05-11T18:33:08.010Z","etag":null,"topics":["chrome","crawler","docker","dsl","ferret","go","hacktoberfest","hacktoberfest2020","scraping","scraping-websites","service","worker"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MontFerret.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":[],"patreon":"ziflex","open_collective":"ferret"}},"created_at":"2020-05-08T12:22:00.000Z","updated_at":"2026-05-11T16:52:13.000Z","dependencies_parsed_at":"2026-05-24T01:00:38.080Z","dependency_job_id":null,"html_url":"https://github.com/MontFerret/worker","commit_stats":null,"previous_names":[],"tags_count":37,"template":false,"template_full_name":null,"purl":"pkg:github/MontFerret/worker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MontFerret%2Fworker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MontFerret%2Fworker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MontFerret%2Fworker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MontFerret%2Fworker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MontFerret","download_url":"https://codeload.github.com/MontFerret/worker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MontFerret%2Fworker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33417489,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-23T22:14:44.296Z","status":"ssl_error","status_checked_at":"2026-05-23T22:14:43.778Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chrome","crawler","docker","dsl","ferret","go","hacktoberfest","hacktoberfest2020","scraping","scraping-websites","service","worker"],"created_at":"2024-11-14T09:28:47.650Z","updated_at":"2026-05-24T01:00:58.711Z","avatar_url":"https://github.com/MontFerret.png","language":"Go","funding_links":["https://patreon.com/ziflex","https://opencollective.com/ferret"],"categories":[],"sub_categories":[],"readme":"# Worker\n\n\u003cp align=\"center\"\u003e\n\t\u003ca href=\"https://goreportcard.com/report/github.com/MontFerret/worker\"\u003e\n\t\t\u003cimg alt=\"Go Report Status\" src=\"https://goreportcard.com/badge/github.com/MontFerret/worker\"\u003e\n\t\u003c/a\u003e\n\u003c!-- \t\u003ca href=\"https://codecov.io/gh/MontFerret/worker\"\u003e\n\t\t\u003cimg alt=\"Code coverage\" src=\"https://codecov.io/gh/MontFerret/worker/branch/master/graph/badge.svg\" /\u003e\n\t\u003c/a\u003e --\u003e\n\t\u003ca href=\"https://discord.gg/kzet32U\"\u003e\n\t\t\u003cimg alt=\"Discord Chat\" src=\"https://img.shields.io/discord/501533080880676864.svg\"\u003e\n\t\u003c/a\u003e\n\t\u003ca href=\"https://github.com/MontFerret/worker/releases\"\u003e\n\t\t\u003cimg alt=\"Lab release\" src=\"https://img.shields.io/github/release/MontFerret/worker.svg\"\u003e\n\t\u003c/a\u003e\n\t\u003ca href=\"https://opensource.org/licenses/Apache-2.0\"\u003e\n\t\t\u003cimg alt=\"Apache-2.0 License\" src=\"http://img.shields.io/badge/license-Apache-brightgreen.svg\"\u003e\n\t\u003c/a\u003e\n\u003c/p\u003e\n\n**Worker** is a simple HTTP server that accepts [FQL (Ferret Query Language)](https://github.com/MontFerret/ferret) queries, executes them and returns their results.\n\n## What is Ferret?\n\n[Ferret](https://github.com/MontFerret/ferret) is a declarative web scraping query language that allows you to extract data from web pages using a SQL-like syntax. Worker provides a REST API interface to execute FQL queries remotely, making it easy to integrate web scraping capabilities into your applications.\n\n**Common use cases:**\n- Web scraping and data extraction from websites\n- Automated testing of web applications\n- Monitoring web pages for changes\n- Generating PDFs or screenshots from web pages\n- Collecting data for analytics and research\n\nOpenAPI v2 schema can be found [here](https://raw.githubusercontent.com/MontFerret/cli/master/reference/ferret-worker.yaml).\n\n## Quick start\n\n### Prerequisites\n\n- Docker (recommended) or Go 1.23+ for local installation\n- For local installation without Docker: Google Chrome or Chromium browser\n\n### Running with Docker\n\nThe Worker is shipped with dedicated Docker image that contains headless Google Chrome, so feel free to run queries using `cdp` driver:\n\n**DockerHub:**\n```sh\ndocker run -d -p 8080:8080 montferret/worker\n```\n\n**GitHub Container Registry:**\n```sh\ndocker run -d -p 8080:8080 ghcr.io/montferret/worker\n```\n\n### Local Installation\n\nAlternatively, if you want to use your own version of Chrome, you can run the Worker locally.\n\n**Install from script:**\n```shell\ncurl https://raw.githubusercontent.com/MontFerret/worker/master/install.sh | sh\nworker\n```\n\n**Build from source:**\n```sh\ngit clone https://github.com/MontFerret/worker.git\ncd worker\nmake\n```\n\n### Your First Query\n\nOnce the Worker is running, you can send FQL queries via POST requests to `http://localhost:8080/`:\n\n**Simple data extraction:**\n```bash\ncurl -X POST http://localhost:8080/ \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"text\": \"LET doc = DOCUMENT(\\\"https://example.com\\\") RETURN doc.title\"\n  }'\n```\n\n**Web scraping with browser automation:**\n```bash\ncurl -X POST http://localhost:8080/ \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"text\": \"LET page = DOCUMENT(\\\"https://github.com\\\", { driver: \\\"cdp\\\" }) WAIT_ELEMENT(page, \\\"h1\\\") RETURN INNER_TEXT(page, \\\"h1\\\")\"\n  }'\n```\n\n**Query with parameters:**\n```bash\ncurl -X POST http://localhost:8080/ \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"text\": \"LET doc = DOCUMENT(@url) RETURN doc.title\",\n    \"params\": {\n      \"url\": \"https://example.com\"\n    }\n  }'\n```\n\n### Visual Example\n\n![worker](https://raw.githubusercontent.com/MontFerret/worker/master/assets/postman.png)\n\n## System Resource Requirements\n- 2 CPU\n- 2 Gb of RAM\n\n## Usage\n\n## API Reference\n\n### Endpoints\n\n#### POST /\nExecutes a given FQL query. The payload must have the following shape:\n\n```json\n{\n  \"text\": \"LET doc = DOCUMENT('https://example.com') RETURN doc.title\",\n  \"params\": {\n    \"optional_param\": \"value\"\n  }\n}\n```\n\n**Request body:**\n- `text` (string, required): The FQL query to execute\n- `params` (object, optional): Parameters to pass to the query (accessible via `@param_name`)\n\n**Response:**\n```json\n{\n  \"data\": \"Example Domain\",\n  \"stats\": {\n    \"execution_time\": \"1.234s\"\n  }\n}\n```\n\n**Error response:**\n```json\n{\n  \"error\": \"run program: Found 2 errors\",\n  \"details\": \"TypeError: runtime error\\n --\u003e anonymous:1:13\\n1 | RETURN { a: @url, b: @limit }\\n  |             ^^^^ parameter is required\\n\"\n}\n```\n\n**Example with complex data extraction:**\n```bash\ncurl -X POST http://localhost:8080/ \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"text\": \"LET page = DOCUMENT(@url, { driver: \\\"cdp\\\" }) LET links = ELEMENTS(page, \\\"a\\\") RETURN links[* LIMIT 5].href\",\n    \"params\": {\n      \"url\": \"https://news.ycombinator.com\"\n    }\n  }'\n```\n\n#### GET /info\nReturns worker information including Chrome, Ferret and worker versions:\n\n```json\n{\n  \"ip\": \"127.0.0.1\",\n  \"version\": {\n    \"worker\": \"1.18.0\",\n    \"chrome\": {\n      \"browser\": \"125.0.6422.141\",\n      \"protocol\": \"1.3\",\n      \"v8\": \"12.5.227.39\",\n      \"webkit\": \"537.36\"\n    },\n    \"ferret\": \"0.18.1\"\n  }\n}\n```\n\n#### GET /health\nHealth check endpoint that returns HTTP 200 when the service is healthy and all dependencies (like Chrome) are accessible. Returns HTTP 424 when dependencies are unavailable.\n\n**Healthy response:**\n```\nHTTP/1.1 200 OK\n```\n\n**Unhealthy response:**\n```\nHTTP/1.1 424 Failed Dependency\n```\n\n## Configuration\n\n### Command Line Options\n\n```bash\n  -log-level=\"debug\"\n    log level (trace, debug, info, warn, error, fatal, panic)\n  -port=8080\n    port to listen\n  -body-limit=1000\n    maximum size of request body in kb. 0 means no limit.\n  -fs-root=\"\"\n    file system root directory for FQL IO::FS functions. Defaults to the current working directory.\n  -request-limit=0\n    amount of requests per second for each IP. 0 means no limit.\n  -request-limit-time-window=180\n    amount of seconds for request rate limit time window.\n  -cache-size=100\n    amount of cached queries. 0 means no caching.\n  -chrome-ip=\"127.0.0.1\"\n    Google Chrome remote IP address\n  -chrome-port=9222\n    Google Chrome remote debugging port\n  -no-chrome=false\n    disable Chrome driver\n  -version=false\n    show version\n  -help=false\n    show this list\n```\n\n### Configuration Examples\n\n**Production deployment with rate limiting:**\n```bash\nworker \\\n  -port=8080 \\\n  -log-level=info \\\n  -request-limit=10 \\\n  -request-limit-time-window=60 \\\n  -body-limit=2000 \\\n  -fs-root=/var/lib/ferret-worker \\\n  -cache-size=500\n```\n\n**Development with debugging:**\n```bash\nworker \\\n  -port=3000 \\\n  -log-level=debug \\\n  -cache-size=0\n```\n\n**Using external Chrome instance:**\n```bash\n# Start Chrome with remote debugging\ngoogle-chrome --headless --remote-debugging-port=9222 \u0026\n\n# Start worker pointing to external Chrome\nworker -chrome-ip=localhost -chrome-port=9222\n```\n\n**Without Chrome (HTTP driver only):**\n```bash\nworker -no-chrome=true\n```\n\n### Docker Configuration\n\n**Custom port and configuration:**\n```bash\ndocker run -d \\\n  -p 3000:3000 \\\n  -e PORT=3000 \\\n  montferret/worker \\\n  worker -port=3000 -log-level=info\n```\n\n**With volume for persistent cache:**\n```bash\ndocker run -d \\\n  -p 8080:8080 \\\n  -v /host/cache:/app/cache \\\n  montferret/worker\n```\n\n## Security Considerations\n\n⚠️ **Important for Production Deployments:**\n\n- **Rate Limiting**: Always enable rate limiting in production (`-request-limit`)\n- **Body Size Limits**: Set appropriate body size limits (`-body-limit`) to prevent abuse\n- **Network Security**: Worker should not be exposed directly to the internet without proper authentication\n- **Query Validation**: Consider implementing query validation/filtering for untrusted input\n- **Filesystem Access**: Worker enables FQL filesystem functions rooted at the current working directory by default. Set `-fs-root` to a dedicated directory in production.\n- **Resource Monitoring**: Monitor CPU and memory usage as complex queries can be resource-intensive\n- **Chrome Security**: The bundled Chrome runs in sandboxed mode, but avoid running as root in production\n\n**Recommended production configuration:**\n```bash\nworker \\\n  -port=8080 \\\n  -log-level=warn \\\n  -request-limit=5 \\\n  -request-limit-time-window=60 \\\n  -body-limit=1000 \\\n  -fs-root=/var/lib/ferret-worker \\\n  -cache-size=200\n```\n\n## Troubleshooting\n\n### Common Issues\n\n**Chrome connection failed:**\n```\nError: failed to connect to Chrome\n```\n- Ensure Chrome is running with `--remote-debugging-port=9222`\n- Check if Chrome is accessible at the configured IP/port\n- For Docker: make sure Chrome service is healthy\n\n**Query timeout:**\n```\nError: query execution timeout\n```\n- Complex pages may take longer to load\n- Consider adding explicit waits in your FQL query\n- Check network connectivity to target websites\n\n**Memory issues:**\n```\nError: out of memory\n```\n- Reduce cache size (`-cache-size`)\n- Limit concurrent requests (`-request-limit`)\n- Monitor Chrome memory usage\n\n**Permission denied:**\n```\nError: permission denied accessing Chrome\n```\n- Ensure proper user permissions for Chrome binary\n- In Docker, avoid running as root when possible\n\n### Debug Mode\n\nEnable debug logging to troubleshoot issues:\n```bash\nworker -log-level=debug\n```\n\n### Health Check\n\nMonitor worker health:\n```bash\ncurl http://localhost:8080/health\ncurl http://localhost:8080/info\n```\n\n## FQL Query Examples\n\n### Basic Web Scraping\n```javascript\n// Extract page title\nLET doc = DOCUMENT(\"https://example.com\")\nRETURN doc.title\n\n// Get all links\nLET doc = DOCUMENT(\"https://example.com\")\nLET links = ELEMENTS(doc, \"a\")\nRETURN links[*].href\n\n// Extract structured data\nLET doc = DOCUMENT(\"https://news.ycombinator.com\")\nLET stories = ELEMENTS(doc, \".titleline \u003e a\")\nRETURN stories[* LIMIT 10].{\n  title: INNER_TEXT(@),\n  url: @.href\n}\n```\n\n### Browser Automation with CDP\n```javascript\n// Navigate and interact with page\nLET page = DOCUMENT(\"https://github.com\", { driver: \"cdp\" })\nWAIT_ELEMENT(page, \"input[name='q']\")\nINPUT(page, \"input[name='q']\", \"ferret\")\nCLICK(page, \"button[type='submit']\")\nWAIT_ELEMENT(page, \".repo-list-item\")\nRETURN ELEMENTS(page, \".repo-list-item h3 a\")[*].{\n  name: INNER_TEXT(@),\n  url: @.href\n}\n\n// Take screenshot\nLET page = DOCUMENT(\"https://example.com\", { driver: \"cdp\" })\nRETURN PDF(page)\n```\n\n### Using Parameters\n```javascript\n// Query with parameters (pass via \"params\" in POST body)\nLET page = DOCUMENT(@url, { driver: \"cdp\" })\nLET selector = @css_selector\nRETURN ELEMENTS(page, selector)[*].{\n  text: INNER_TEXT(@),\n  href: @.href\n}\n```\n\n## Development\n\n### Building from Source\n\n```bash\n# Clone repository\ngit clone https://github.com/MontFerret/worker.git\ncd worker\n\n# Install dependencies\nmake install\n\n# Build\nmake build\n\n# Run tests\nmake test\n\n# Start development server\nmake start\n```\n\n### Contributing\n\n1. Fork the repository\n2. Create a feature branch: `git checkout -b my-feature`\n3. Make your changes\n4. Run tests: `make test`\n5. Run linter: `make lint`\n6. Commit changes: `git commit -am 'Add some feature'`\n7. Push to the branch: `git push origin my-feature`\n8. Submit a pull request\n\n### Project Structure\n\n```\n├── cmd/                    # Command-line interface\n├── internal/               # Internal application code\n│   ├── controllers/        # HTTP request handlers\n│   ├── server/            # HTTP server configuration\n│   └── storage/           # Caching layer\n├── pkg/                   # Public packages\n│   ├── caching/           # Cache implementation\n│   └── worker/            # Core worker logic\n├── reference/             # OpenAPI specification\n└── assets/               # Documentation assets\n```\n\n## Links\n\n- [Ferret Query Language Documentation](https://github.com/MontFerret/ferret)\n- [OpenAPI Specification](https://raw.githubusercontent.com/MontFerret/cli/master/reference/ferret-worker.yaml)\n- [Docker Hub](https://hub.docker.com/r/montferret/worker)\n- [GitHub Container Registry](https://github.com/MontFerret/worker/pkgs/container/worker)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmontferret%2Fworker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmontferret%2Fworker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmontferret%2Fworker/lists"}