{"id":19025265,"url":"https://github.com/evgeniradev/page_scraper","last_synced_at":"2026-06-18T01:31:02.189Z","repository":{"id":44525581,"uuid":"209391314","full_name":"evgeniradev/page_scraper","owner":"evgeniradev","description":"An Elixir-based page scraper app built on Phoenix. It detects changes on a given web page and logs them to a database.","archived":false,"fork":false,"pushed_at":"2023-01-05T22:46:29.000Z","size":286,"stargazers_count":2,"open_issues_count":9,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-21T19:14:16.554Z","etag":null,"topics":["elixir","phoenix","scraper"],"latest_commit_sha":null,"homepage":null,"language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/evgeniradev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-18T19:47:33.000Z","updated_at":"2021-11-12T12:09:35.000Z","dependencies_parsed_at":"2023-02-04T21:45:29.866Z","dependency_job_id":null,"html_url":"https://github.com/evgeniradev/page_scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/evgeniradev/page_scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evgeniradev%2Fpage_scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evgeniradev%2Fpage_scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evgeniradev%2Fpage_scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evgeniradev%2Fpage_scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/evgeniradev","download_url":"https://codeload.github.com/evgeniradev/page_scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evgeniradev%2Fpage_scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34472822,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-17T02:00:05.408Z","response_time":127,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elixir","phoenix","scraper"],"created_at":"2024-11-08T20:43:03.759Z","updated_at":"2026-06-18T01:31:02.169Z","avatar_url":"https://github.com/evgeniradev.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PageScraper\n\nAn Elixir-based page scraper app built on Phoenix.\nIt detects changes on a given web page and logs them to a database.\nIt uses Selenium and the Hound package to load pages in a Chrome session.\nCurrently, the app uses a single Chrome session, which has an impact on the polling speed when more than 1 pages are being polled at the same time.\n\n## Installation\n\nPlease, use [Docker](https://docs.docker.com/) to use the app.\n\nRun the below setup command to build the containers, create a new database and run the migrations. Please note, the command drops any existing database.\n```\n$ ./setup.sh\n```\n\nStart the app in development mode:\n```\n$ ./start.sh\n```\n\nFinally, load [http://localhost](http://localhost) in your browser.\n\n## Running the tests\n\n```\n$ ./test.sh\n```\n\n## Details\nCreate a .env file in the app's root directory to use the below options.\n\nTo specify a Timezone, add the following environment variable to the file:\n```\nTZ=your_time_zone         Default: Europe/London\n```\n\nTo specify the limit of logged changes per page, add the following environment variable to the file:\n```\nPAGE_CHANGES_LIMIT=100    Default: 100\n```\n\n## To-do list\n\n* Add a Healthcheck to the page_scraper_selenium_chrome docker container\n* Display live workers' status using channels/WebSockets\n* Improve and finish off tests\n* Improve frontend/design\n* Implement a way to get page status before pulling page source\n* Implement pagination\n* Implement multiple Chrome sessions\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevgeniradev%2Fpage_scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevgeniradev%2Fpage_scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevgeniradev%2Fpage_scraper/lists"}