{"id":20582318,"url":"https://github.com/ghostwords/chameleon-crawler","last_synced_at":"2025-04-14T20:15:47.467Z","repository":{"id":24122889,"uuid":"27511390","full_name":"ghostwords/chameleon-crawler","owner":"ghostwords","description":"Browser automation for Chameleon.","archived":false,"fork":false,"pushed_at":"2016-09-27T20:34:15.000Z","size":82,"stargazers_count":19,"open_issues_count":8,"forks_count":7,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-14T20:15:41.718Z","etag":null,"topics":["chromedriver","selenium"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ghostwords.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-12-03T22:41:22.000Z","updated_at":"2024-02-07T18:19:51.000Z","dependencies_parsed_at":"2022-07-10T10:31:34.883Z","dependency_job_id":null,"html_url":"https://github.com/ghostwords/chameleon-crawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ghostwords%2Fchameleon-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ghostwords%2Fchameleon-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ghostwords%2Fchameleon-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ghostwords%2Fchameleon-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ghostwords","download_url":"https://codeload.github.com/ghostwords/chameleon-crawler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248952355,"owners_count":21188426,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chromedriver","selenium"],"created_at":"2024-11-16T06:34:31.093Z","updated_at":"2025-04-14T20:15:47.449Z","avatar_url":"https://github.com/ghostwords.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Chameleon Crawler\n\nBrowser automation for [Chameleon](https://github.com/ghostwords/chameleon).\n\n\n## Setup\n\n- Install Chromium, chromedriver, python3 and xvfb. On Ubuntu:\n```\nsudo apt-get install chromium-browser chromium-chromedriver python3 xvfb\n```\n\n- Install the project's Python dependencies (documented in [requirements.txt](requirements.txt)). You might do this with `virtualenv` and `pip`, or maybe Docker. Note this is a Python 3 project.\n\n- Make sure `chromedriver` is in your $PATH. It's not on Ubuntu, so we have to fix that:\n```\nsudo ln -s /usr/lib/chromium-browser/chromedriver /usr/local/bin/chromedriver\n```\n\n- If using Ubuntu 14.04, [fix chromedriver's shared libraries error](http://stackoverflow.com/questions/25695299/chromedriver-on-ubuntu-14-04-error-while-loading-shared-libraries-libui-base):\n```\necho \"/usr/lib/chromium-browser/libs\" | sudo tee --append /etc/ld.so.conf.d/chrome_lib.conf \u003e/dev/null\nsudo ldconfig\n```\n\n- Finally, generate a Chameleon CRX package [by following development setup steps 1 and 4 in Chameleon's checkout](https://github.com/ghostwords/chameleon#development-setup).\n\n\n## Usage\n\nRun `./crawl.py /path/to/chameleon.crx` to perform a crawl, or `./crawl.py -h` to see the optional arguments:\n\n```\nusage: crawl.py [-h] [--headless | --no-headless] [-n {1,2,3,4,5,6,7,8}] [-q]\n                [-t SECONDS] [--urls URL_FILE_PATH]\n                CHAMELEON_CRX_FILE_PATH\n\npositional arguments:\n  CHAMELEON_CRX_FILE_PATH\n                        path to Chameleon CRX package\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --headless            use a virtual display (default)\n  --no-headless\n  -n {1,2,3,4,5,6,7,8}  how many browsers to use in parallel (default: 4)\n  -q, --quiet           turn off standard output\n  -t SECONDS, --timeout SECONDS\n                        how many seconds to wait for pages to finish loading\n                        before timing out (default: 20)\n  --urls URL_FILE_PATH  path to URL list file (default: urls.txt)\n```\n\nRun `./view.py` and visit the displayed URL to review crawl results.\n\n\n## Roadmap\n\n1. Crawl Alexa Global Top 1,000,000 Sites: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip\n2. Analyze results:\n\t- Discover fingerprinters\n\t- Confirm detection of known fingerprinters\n3. Tweak the heuristic to minimize false negatives/positives.\n4. Create minisite to chart (the growth of?) fingerprinting across the Web.\n\n\n## Code license\n\nMozilla Public License Version 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fghostwords%2Fchameleon-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fghostwords%2Fchameleon-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fghostwords%2Fchameleon-crawler/lists"}