{"id":15945490,"url":"https://github.com/dansuh17/facecrawler","last_synced_at":"2026-02-16T10:41:06.322Z","repository":{"id":40987276,"uuid":"102696042","full_name":"dansuh17/facecrawler","owner":"dansuh17","description":"Distributed, continuous web image crawler.","archived":false,"fork":false,"pushed_at":"2022-12-07T23:44:14.000Z","size":183,"stargazers_count":1,"open_issues_count":9,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-10-06T20:49:01.499Z","etag":null,"topics":["image","selenium","webdriver"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dansuh17.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-09-07T05:42:12.000Z","updated_at":"2021-10-12T23:22:15.000Z","dependencies_parsed_at":"2022-09-23T22:12:00.200Z","dependency_job_id":null,"html_url":"https://github.com/dansuh17/facecrawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dansuh17/facecrawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dansuh17%2Ffacecrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dansuh17%2Ffacecrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dansuh17%2Ffacecrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dansuh17%2Ffacecrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dansuh17","download_url":"https://codeload.github.com/dansuh17/facecrawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dansuh17%2Ffacecrawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29506264,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-16T09:05:14.864Z","status":"ssl_error","status_checked_at":"2026-02-16T08:55:59.364Z","response_time":115,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["image","selenium","webdriver"],"created_at":"2024-10-07T09:03:06.699Z","updated_at":"2026-02-16T10:41:06.295Z","avatar_url":"https://github.com/dansuh17.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Crawlfish\n\nDistributed, continuous image crawler.\n\n## Requirements\n\nFirefox web browser (gecko driver)\n\n## Running\n\nCreating a virtual environment for python recommended.\n\n`python3 -m venv ./venv`\n\nThen install dependent packages.\n\n`pip3 install -r requirements.txt`\n\nIn order to keep the monitoring running, a monitoring server must be set up before crawling node starts.\nStart running the monitor server using this command:\n\n`python3 cherryServer.py`\n\nStart crawling using the following command.\n\n`python3 crawler.py --[option] [option_value]`\n\nAvialable options are:\n- `--site [site]` target site to crawl (instagram, facebook, etc.)\n- `--filter [filter_type]` type of data filter to screen the data (face)\n- `--nthread [number_of_threads]` number of threads used to load web driver and start crawling\n- `--logpath [folder_name]` folder name to save the logs in\n\nThe status of crawling may be monitored using the monitor reader.\n\n`python3 monitor_read.py`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdansuh17%2Ffacecrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdansuh17%2Ffacecrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdansuh17%2Ffacecrawler/lists"}