{"id":15151844,"url":"https://github.com/robertobochet/scraper-bot","last_synced_at":"2026-02-24T03:00:51.605Z","repository":{"id":45239108,"uuid":"431980706","full_name":"RobertoBochet/scraper-bot","owner":"RobertoBochet","description":"A customizable web scraper","archived":false,"fork":false,"pushed_at":"2025-09-04T15:42:09.000Z","size":364,"stargazers_count":4,"open_issues_count":6,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-29T13:27:10.948Z","etag":null,"topics":["apprise","automation","playwright","playwright-python","python","scraper","telegram","telegram-bot","web-scaper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RobertoBochet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"custom":["https://paypal.me/robertobochet"]}},"created_at":"2021-11-25T21:25:53.000Z","updated_at":"2025-07-07T08:30:41.000Z","dependencies_parsed_at":"2026-02-24T03:00:43.017Z","dependency_job_id":null,"html_url":"https://github.com/RobertoBochet/scraper-bot","commit_stats":{"total_commits":54,"total_committers":4,"mean_commits":13.5,"dds":"0.38888888888888884","last_synced_commit":"98533ad39af8950035e9330a162ab5079d80ece2"},"previous_names":["robertobochet/bot-scraper"],"tags_count":21,"template":false,"template_full_name":null,"purl":"pkg:github/RobertoBochet/scraper-bot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobertoBochet%2Fscraper-bot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobertoBochet%2Fscraper-bot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobertoBochet%2Fscraper-bot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobertoBochet%2Fscraper-bot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RobertoBochet","download_url":"https://codeload.github.com/RobertoBochet/scraper-bot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobertoBochet%2Fscraper-bot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29770195,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-24T01:40:24.820Z","status":"online","status_checked_at":"2026-02-24T02:00:07.497Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apprise","automation","playwright","playwright-python","python","scraper","telegram","telegram-bot","web-scaper"],"created_at":"2024-09-26T15:22:41.973Z","updated_at":"2026-02-24T03:00:51.572Z","avatar_url":"https://github.com/RobertoBochet.png","language":"Python","funding_links":["https://paypal.me/robertobochet"],"categories":[],"sub_categories":[],"readme":"# Scraper Bot\n\n[![GitHub](https://img.shields.io/github/license/RobertoBochet/scraper-bot?style=flat-square)](https://github.com/RobertoBochet/scraper-bot)\n[![GitHub Version](https://img.shields.io/github/v/tag/RobertoBochet/scraper-bot?label=version\u0026style=flat-square)](https://github.com/RobertoBochet/scraper-bot)\n[![PyPI - Version](https://img.shields.io/pypi/v/scraper-bot?style=flat-square)](https://pypi.org/project/scraper-bot/)\n[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/RobertoBochet/scraper-bot/test-code.yml?label=test%20code\u0026style=flat-square)](https://github.com/RobertoBochet/scraper-bot)\n[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/RobertoBochet/scraper-bot/release.yml?label=publish%20release\u0026style=flat-square)](https://github.com/RobertoBochet/scraper-bot/pkgs/container/scraper-bot)\n[![CodeFactor Grade](https://img.shields.io/codefactor/grade/github/RobertoBochet/scraper-bot?style=flat-square)](https://www.codefactor.io/repository/github/robertobochet/scraper-bot)\n\nThis is a bot thought to do periodical scraping of ads from commercial websites.\n\nFound a new ad the bot will send it to you exploiting [Apprise](https://github.com/caronc/apprise) channels\n\n## Deploy\n\n### Pypi\n\nThe relative package is available on [Pypi](https://pypi.org/project/scraper-bot/)\n\n```shell\npip install scraper-bot\n```\nThe package heavily relays on [`playwright`](https://playwright.dev/python/) package, so before start to use the bot you have to install a playwright browser\n```shell\nplaywright install --with-deps firefox\n```\nYou can found further information in the [`playwright` documentation](https://playwright.dev/python/docs/browsers)\n_(n.b. the bot are not limited to use firefox only)_\n\nThe `scraper-bot` package provide the following command to run the bot\n```shell\nscraper-bot\n```\n\n### Container\n\nThe CI builds the container for each version and it puts it on the public [GitHub registry](https://ghcr.io/robertobochet/scraper-bot)\n```\nghcr.io/robertobochet/scraper-bot\n```\n\n#### docker compose\n\n1. [Create a telegram bot](https://core.telegram.org/bots#3-how-do-i-create-a-bot) and retrieve its token\n2. Download `config.example.yaml` and rename it to `config.yaml`\n3. Change the configuration follow the [guidelines](#configuration)\n4. Download `docker-compose.yaml`\n5. Start the scraper with `docker-compose`\n    ```shell\n    docker-compose up\n    ```\n6. Wait that the bot does its work!\n\n### Kubernetes (Helm chart)\n\nFor the deploy of the **Scraper Bot** is also available a [helm chart](https://helm.sh/)\n\nYou can found the source code in the repo [`scraper-bot-chart`](https://github.com/RobertoBochet/scraper-bot-chart)\n\nHelm chart package is available in the github OCI registry\n```\noci://ghcr.io/robertobochet/scraper-bot-chart\n```\nYou can use it to directly deploy on your kubernetes cluster\n1. Retrieve the default values file\n   ```shell\n   helm show values oci://ghcr.io/robertobochet/scraper-bot-chart \u003e values.yaml\n   ```\n2. Customize the `values.yaml`\n3. Install the scaper bot\n   ```shell\n   helm install oci://ghcr.io/robertobochet/scraper-bot-chart scraper-bot -f values.yaml\n   ```\n\n## Configuration\n\nBy default the bot looks for a configuration file in the following path `./config.y(a)ml` and `/etc/scaraper-bot/config.y(a)ml`. You cna override this behavior passing via command line the `--config` argument followed by the config file path\n```shell\nscraper-bot --config /path/to/scraper-bot-config.yaml\n```\n\nThe configuration file has to satisfy the pydantic model which you can find in `scraper_bot.settings`.\nFurthermore you can get the config json schema from command line with `--config-schema` argument\n```shell\nscraper-bot --config-schema\n```\n\nYou can also find a configuration example in `config.example.yaml`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobertobochet%2Fscraper-bot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobertobochet%2Fscraper-bot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobertobochet%2Fscraper-bot/lists"}