{"id":18842961,"url":"https://github.com/openwpm/openwpm-crawler","last_synced_at":"2025-04-14T07:32:06.004Z","repository":{"id":45265580,"uuid":"148212821","full_name":"openwpm/openwpm-crawler","owner":"openwpm","description":"A crawler that uses OpenWPM.","archived":false,"fork":false,"pushed_at":"2021-12-26T20:12:58.000Z","size":122,"stargazers_count":12,"open_issues_count":11,"forks_count":8,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-10T13:43:38.141Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openwpm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-09-10T20:09:30.000Z","updated_at":"2024-07-22T21:10:58.000Z","dependencies_parsed_at":"2022-09-10T02:02:15.030Z","dependency_job_id":null,"html_url":"https://github.com/openwpm/openwpm-crawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openwpm%2Fopenwpm-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openwpm%2Fopenwpm-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openwpm%2Fopenwpm-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openwpm%2Fopenwpm-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openwpm","download_url":"https://codeload.github.com/openwpm/openwpm-crawler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248839481,"owners_count":21169820,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T02:56:10.793Z","updated_at":"2025-04-14T07:32:05.945Z","avatar_url":"https://github.com/openwpm.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenWPM Crawler\n\nLaunch OpenWPM crawls using Kubernetes [Job](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/) workloads\nor stand up some docker-compose services to run the crawl in a distributed fashion.\n\nA Redis work queue is set up and loaded with the list of URLs to crawl.\n\nContainers running either locally\nor in the cloud execute the OpenWPM crawler.py script which will continuously fetch sites to run\nand exit once there are no additional sites in the queue.\n\n## Preparations\n\nTo install all the required tools (using conda)\n\n```bash\n./install.sh\nconda activate openwpm-crawler\n```\n\n## Run a crawl locally (using Kubernetes)\n\nSee [./deployment/local/README.md](./deployment/local/README.md).\n\n## Run a crawl in Google Cloud Platform\n\nSee [./deployment/gcp/README.md](./deployment/gcp/README.md).\n\n## Run a crawl locally (using docker-compose)\n\nSee [./deployment/local-compose/README.md](./deployment/local-compose/README.md).\nThis is the simplest option, requiring only docker-compose which is shipped with\nDocker on both Mac and Windows, however behaviour might slightly differ from\ncloud crawls.\n\n## Analyze crawl results\n\n```bash\njupyter notebook\n```\n\nAfter launching Jupyter, navigate to `analysis/Sample Analysis.ipynb` and choose `Kernel -\u003e Change Kernel -\u003e openwpm-crawler` in the menu.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenwpm%2Fopenwpm-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenwpm%2Fopenwpm-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenwpm%2Fopenwpm-crawler/lists"}