{"id":19124076,"url":"https://github.com/efforg/badger-sett","last_synced_at":"2025-04-05T11:11:07.431Z","repository":{"id":37587544,"uuid":"129810896","full_name":"EFForg/badger-sett","owner":"EFForg","description":"Automated training for Privacy Badger. Badger Sett automates browsers to visit websites to produce fresh Privacy Badger tracker data.","archived":false,"fork":false,"pushed_at":"2024-10-29T11:28:43.000Z","size":192046,"stargazers_count":119,"open_issues_count":3,"forks_count":14,"subscribers_count":17,"default_branch":"master","last_synced_at":"2024-10-29T13:23:05.590Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://www.eff.org/badger-pretraining","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EFForg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-16T21:58:02.000Z","updated_at":"2024-10-29T11:28:47.000Z","dependencies_parsed_at":"2024-03-01T17:43:45.128Z","dependency_job_id":"d23d964e-6368-419a-a61e-502914bc46d4","html_url":"https://github.com/EFForg/badger-sett","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EFForg%2Fbadger-sett","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EFForg%2Fbadger-sett/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EFForg%2Fbadger-sett/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EFForg%2Fbadger-sett/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EFForg","download_url":"https://codeload.github.com/EFForg/badger-sett/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247325693,"owners_count":20920714,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T05:28:04.255Z","updated_at":"2025-04-05T11:11:07.400Z","avatar_url":"https://github.com/EFForg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Badger Sett\n\n\u003e A *sett* or set is a badger's den which usually consists of a network of tunnels\n  and numerous entrances. Setts incorporate larger chambers used for sleeping or\n  rearing young.\n\nThis script is designed to raise young [Privacy Badgers](https://github.com/EFForg/privacybadger) by teaching them\nabout the trackers on popular sites. Every day, [`crawler.py`](./crawler.py) visits thousands of the top sites from the [Tranco List](https://tranco-list.eu) with the latest version of Privacy Badger, and saves its findings in `results.json`.\n\nSee the following EFF.org blog post for more information: [Giving Privacy Badger a Jump Start](https://www.eff.org/deeplinks/2018/08/giving-privacy-badger-jump-start).\n\n\n## Development setup\n\n1. Install Python 3.8+\n\n2. Create and activate a Python virtual environment:\n\n    ```bash\n    python3 -m venv venv\n    source ./venv/bin/activate\n    pip install -U pip\n    ```\n\n    For more, read [this blog post](https://snarky.ca/a-quick-and-dirty-guide-on-how-to-install-packages-for-python/).\n\n3. Install Python dependencies with `pip install -r requirements.txt`\n\n4. Run static analysis with `prospector`\n\n5. Run unit tests with `pytest`\n\n6. Take a look at Badger Sett commandline flags with `./crawler.py --help`\n\n7. Git clone the [Privacy Badger repository](https://github.com/EFForg/privacybadger) somewhere\n\n8. Try running a tiny scan:\n\n    ```bash\n    ./crawler.py firefox 5 --no-xvfb --log-stdout --pb-dir /path/to/privacybadger\n    ```\n\n\n## Production setup with Docker\n\nDocker takes care of all dependencies, including setting up the latest browser version.\n\nHowever, Docker brings its own complexity. Problems from improper file ownership and permissions are a particular pain point.\n\n0. Prerequisites: have [Docker](https://docs.docker.com/get-docker/) installed.\n   Make sure your user is part of the `docker` group so that you can build and\n   run docker images without `sudo`. You can add yourself to the group with\n\n   ```\n   $ sudo usermod -aG docker $USER\n   ```\n\n1. Clone the repository\n\n   ```\n   $ git clone https://github.com/efforg/badger-sett\n   ```\n\n2. Run a scan\n\n   ```\n   $ BROWSER=firefox ./runscan.sh 500\n   ```\n\n   This will scan the top 500 sites on the Tranco list in Chrome\n   with the latest version of Privacy Badger's master branch.\n\n   To run the script with a different branch of Privacy Badger, set the `PB_BRANCH`\n   variable. e.g.\n\n   ```\n   $ PB_BRANCH=my-feature-branch BROWSER=firefox ./runscan.sh 500\n   ```\n\n   You can also pass arguments to `crawler.py`, the Python script that does\n   the actual crawl. Any arguments passed to `runscan.sh` will be\n   forwarded to `crawler.py`. For example, to exclude all websites ending\n   with .gov and .mil from your website visit list:\n\n   ```\n   $ BROWSER=edge ./runscan.sh 500 --exclude .gov,.mil\n   ```\n\n3. Monitor the scan\n\n   To have the scan print verbose output about which sites it's visiting, use\n   the `--log-stdout` argument.\n\n   If you don't use that argument, all output will still be logged to\n   `docker-out/log.txt`, beginning after the script outputs \"Running scan in\n   Docker...\"\n\n### Automatic scanning\n\nTo set up the script to run periodically and automatically update the\nrepository with its results:\n\n1. Create a new ssh key with `ssh-keygen`. Give it a name unique to the\n   repository.\n\n   ```\n   $ ssh-keygen\n   Generating public/private rsa key pair.\n   Enter file in which to save the key (/home/USER/.ssh/id_rsa): /home/USER/.ssh/id_rsa_badger_sett\n   ```\n\n2. Add the new key as a deploy key with R/W access to the repo on Github.\n   https://developer.github.com/v3/guides/managing-deploy-keys/\n\n3. Add a SSH host alias for Github that uses the new key pair. Create or open\n   `~/.ssh/config` and add the following:\n\n   ```\n   Host github-badger-sett\n     HostName github.com\n     User git\n     IdentityFile /home/USER/.ssh/id_rsa_badger_sett\n   ```\n\n4. Configure git to connect to the remote over SSH. Edit `.git/config`:\n\n   ```\n   [remote \"origin\"]\n     url = ssh://git@github-badger-sett:/efforg/badger-sett\n   ```\n\n   This will have `git` connect to the remote using the new SSH keys by default.\n\n5. Create a cron job to call `runscan.sh` once a day. Set the environment\n   variable `RUN_BY_CRON=1` to turn off TTY forwarding to `docker run` (which\n   would break the script in cron), and set `GIT_PUSH=1` to have the script\n   automatically commit and push `results.json` when the scan finishes. Here's an\n   example `crontab` entry:\n\n   ```\n   0 0 * * *  RUN_BY_CRON=1 GIT_PUSH=1 BROWSER=chrome /home/USER/badger-sett/runscan.sh 6000 --exclude=.mil,.mil.??,.gov,.gov.??,.edu,.edu.??\n   ```\n\n6. If everything has been set up correctly, the script should push a new version\n   of `results.json` after each scan.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fefforg%2Fbadger-sett","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fefforg%2Fbadger-sett","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fefforg%2Fbadger-sett/lists"}