{"id":15036610,"url":"https://github.com/openwpm/openwpm","last_synced_at":"2025-12-24T02:11:33.784Z","repository":{"id":17805559,"uuid":"20693743","full_name":"openwpm/OpenWPM","owner":"openwpm","description":"A web privacy measurement framework","archived":false,"fork":false,"pushed_at":"2025-04-20T20:26:45.000Z","size":106794,"stargazers_count":1368,"open_issues_count":133,"forks_count":317,"subscribers_count":65,"default_branch":"master","last_synced_at":"2025-06-05T08:02:59.516Z","etag":null,"topics":["crawler","firefox","privacy","python3"],"latest_commit_sha":null,"homepage":"https://openwpm.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openwpm.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2014-06-10T16:59:27.000Z","updated_at":"2025-06-04T12:23:09.000Z","dependencies_parsed_at":"2023-10-15T03:33:40.567Z","dependency_job_id":"3cf6355c-7df6-4801-bd7c-1c745e9dbf2f","html_url":"https://github.com/openwpm/OpenWPM","commit_stats":{"total_commits":1774,"total_committers":67,"mean_commits":26.47761194029851,"dds":0.6290868094701241,"last_synced_commit":"dedc84bd3e9fc13122bcd04f7648f48aaaf3e983"},"previous_names":["mozilla/openwpm","citp/openwpm"],"tags_count":48,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openwpm%2FOpenWPM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openwpm%2FOpenWPM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openwpm%2FOpenWPM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openwpm%2FOpenWPM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openwpm","download_url":"https://codeload.github.com/openwpm/OpenWPM/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openwpm%2FOpenWPM/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259280320,"owners_count":22833424,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","firefox","privacy","python3"],"created_at":"2024-09-24T20:31:42.207Z","updated_at":"2025-12-24T02:11:33.750Z","avatar_url":"https://github.com/openwpm.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# OpenWPM [![Documentation Status](https://readthedocs.org/projects/openwpm/badge/?version=latest)](https://openwpm.readthedocs.io/en/latest/?badge=latest) [![Build Status](https://github.com/openwpm/OpenWPM/workflows/Tests%20and%20linting/badge.svg?branch=master)](https://github.com/openwpm/OpenWPM/actions?query=branch%3Amaster) [![OpenWPM Matrix Channel](https://img.shields.io/matrix/OpenWPM:mozilla.org?label=Join%20us%20on%20matrix\u0026server_fqdn=mozilla.modular.im)](https://matrix.to/#/#OpenWPM:mozilla.org?via=mozilla.org) \u003c!-- omit in toc --\u003e\n\nOpenWPM is a web privacy measurement framework which makes it easy to\ncollect data for privacy studies on a scale of thousands to millions\nof websites. OpenWPM is built on top of Firefox, with automation provided\nby Selenium. It includes several hooks for data collection. Check out\nthe instrumentation section below for more details.\n\n## Table of Contents \u003c!-- omit in toc --\u003e\n\n- [Installation](#installation)\n  - [Pre-requisites](#pre-requisites)\n  - [Install](#install)\n  - [Mac OSX](#mac-osx)\n- [Quick Start](#quick-start)\n- [Troubleshooting](#troubleshooting)\n- [Documentation](#documentation)\n- [Advice for Measurement Researchers](#advice-for-measurement-researchers)\n- [Developer instructions](#developer-instructions)\n- [Instrumentation and Configuration](#instrumentation-and-configuration)\n- [Storage](#storage)\n  - [Local Storage](#local-storage)\n  - [Remote storage](#remote-storage)\n- [Docker Deployment for OpenWPM](#docker-deployment-for-openwpm)\n  - [Building the Docker Container](#building-the-docker-container)\n  - [Running Measurements from inside the Container](#running-measurements-from-inside-the-container)\n  - [MacOS GUI applications in Docker](#macos-gui-applications-in-docker)\n- [Citation](#citation)\n- [License](#license)\n\n## Installation\n\nOpenWPM is tested on Ubuntu 18.04 via GitHub actions and is commonly used via the docker container\nthat this repo builds, which is also based on Ubuntu. Although we don't officially support\nother platforms, conda is a cross platform utility and the install script can be expected\nto work on OSX and other linux distributions.\n\nOpenWPM does not support windows: \u003chttps://github.com/openwpm/OpenWPM/issues/503\u003e\n\n### Pre-requisites\n\nThe main pre-requisite for OpenWPM is conda, a fast cross-platform package management tool.\n\nConda is open-source and can be installed from \u003chttps://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html\u003e.\n\n### Install\n\nAn installation script, `install.sh` is included to: install the conda environment,\ninstall unbranded firefox, and build the instrumentation extension.\n\nAll installation is confined to your conda environment and should not affect your machine.\nThe installation script will, however, override any existing conda environment named openwpm.\n\nTo run the install script, run\n\n```bash\n./install.sh\n```\n\nAfter running the install script, activate your conda environment by running:\n\n```bash\nconda activate openwpm\n```\n\n### Mac OSX\n\nYou may need to install `make` / `gcc` in order to build the extension.\nThe necessary packages are part of xcode: `xcode-select --install`\n\nWe do not run CI tests for Mac, so new issues may arise. We welcome PRs to fix\nthese issues and add full CI testing for Mac.\n\nRunning Firefox with xvfb on OSX is untested and will require the user to install\nan X11 server. We suggest [XQuartz](https://www.xquartz.org/). This setup has not\nbeen tested, we welcome feedback as to whether this is working.\n\n## Quick Start\n\nOnce installed, it is very easy to run a quick test of OpenWPM. Check out\n`demo.py` for an example. This will use the default setting specified in\n`openwpm/config.py::ManagerParams` and\n`openwpm/config.py::BrowserParams`, with the exception of the changes\nspecified in `demo.py`.\n\nThe demo script also includes a sample of how to use the\n[Tranco](https://tranco-list.eu/) top sites list via the optional command line\nflag `demo.py --tranco`. Note that since this is a real top sites list it will\ninclude NSFW websites, some of which will be highly ranked.\n\nMore information on the instrumentation and configuration parameters is given\nbelow.\n\nThe docs provide a more [in-depth tutorial](docs/Using_OpenWPM.md),\nand a description of the [methods of data collection](docs/Configuration.md#Instruments)\navailable.\n\n## Troubleshooting\n\n1. `WebDriverException: Message: The browser appears to have exited before we could connect...`\n\n    This error indicates that Firefox exited during startup (or was prevented from\n    starting). There are many possible causes of this error:\n\n    - If you are seeing this error for all browser spawn attempts check that:\n      - Both selenium and Firefox are the appropriate versions. Run the following\n      commands and check that the versions output match the required versions in\n      `install.sh` and `environment.yaml`. If not, re-run the install script.\n\n      ```sh\n      cd firefox-bin/\n      firefox --version\n      ```\n\n      and\n\n      ```sh\n        conda list selenium\n      ```\n\n      - If you are running in a headless environment (e.g. a remote server), ensure\n      that all browsers have the `headless` browser parameter set to `True` before\n      launching.\n    - If you are seeing this error randomly during crawls it can be caused by\n    an overtaxed system, either memory or CPU usage. Try lowering the number of\n    concurrent browsers.\n\n2. In older versions of firefox (pre 74) the setting to enable extensions was called\n    `extensions.legacy.enabled`. If you need to work with earlier firefox, update the\n    setting name `extensions.experiments.enabled` in\n    `openwpm/deploy_browsers/configure_firefox.py`.\n\n3. Make sure you're conda environment is activated (`conda activate openwpm`). You can see\n    you environments and the activate one by running `conda env list` the active environment\n    will have a `*` by it.\n\n4. `make` / `gcc` may need to be installed in order to build the web extension.\n    On Ubuntu, this is achieved with `apt-get install make`. On OSX the necessary\n    packages are part of xcode: `xcode-select --install`.\n5. On a very sparse operating system additional dependencies may need to be\n    installed. See the [Dockerfile](Dockerfile) for more inspiration, or open\n    an issue if you are still having problems.\n6. If you see errors related to incompatible or non-existing python packages,\n    try re-running the file with the environment variable\n    `PYTHONNOUSERSITE` set. E.g., `PYTHONNOUSERSITE=True python demo.py`.\n    If that fixes your issues, you are experiencing\n    [issue 689](https://github.com/openwpm/OpenWPM/issues/689), which can be\n    fixed by clearing your\n    python [user site packages directory](https://www.python.org/dev/peps/pep-0370/),\n    by prepending `PYTHONNOUSERSITE=True` to a specific command, or by setting\n    the environment variable for the session (e.g., `export PYTHONNOUSERSITE=True`\n    in bash). Please also add a comment to that issue to let us know you ran\n    into this problem.\n\n## Documentation\n\nFurther information is available at [OPENWPM's Documentation Page](https://openwpm.readthedocs.io).\n\n## Advice for Measurement Researchers\n\nOpenWPM is [often used](https://openwpm.readthedocs.io/Papers.html) for web\nmeasurement research. We recommend the following for researchers using the tool:\n\n**Use a versioned [release](https://github.com/openwpm/OpenWPM/releases).** We\naim to follow Firefox's release cadence, which is roughly once every four\nweeks. If we happen to fall behind on checking in new releases, please file an\nissue. Versions more than a few months out of date will use unsupported\nversions of Firefox, which are likely to have known security\nvulnerabilities. Versions less than v0.10.0 are from a previous architecture\nand should not be used.\n\n**Include the OpenWPM version number in your publication.** As of v0.10.0\nOpenWPM pins all python, npm, and system dependencies. Including this\ninformation alongside your work will allow other researchers to contextualize\nthe results, and can be helpful if future versions of OpenWPM have\ninstrumentation bugs that impact results.\n\n## Developer instructions\n\nIf you want to contribute to OpenWPM have a look at our [CONTRIBUTING.md](./CONTRIBUTING.md)\n\n## Instrumentation and Configuration\n\nOpenWPM provides a breadth of configuration options which can be found\nin [Configuration.md](docs/Configuration.md)\nMore detail on the output is available [below](#storage).\n\n## Storage\n\nOpenWPM distinguishes between two types of data, structured and unstructured.\nStructured data is all data captured by the instrumentation or emitted by the platform.\nGenerally speaking all data you download is unstructured data.\n\nFor each of the data classes we offer a variety of storage providers, and you are encouraged\nto implement your own, should the provided backends not be enough for you.\n\nWe have an outstanding issue to enable saving content generated by commands, such as\nscreenshots and page dumps to unstructured storage (see [#232](https://github.com/openwpm/OpenWPM/issues/232)).  \nFor now, they get saved to `manager_params.data_directory`.\n\n### Local Storage\n\nFor storing structured data locally we offer two StorageProviders:\n\n- The SQLiteStorageProvider which writes all data into a SQLite database\n  - This is the recommended approach for getting started as the data is easily explorable\n- The LocalArrowProvider which stores the data into Parquet files.\n  - This method integrates well with NumPy/Pandas\n  - It might be harder to ad-hoc process\n\nFor storing unstructured data locally we also offer two solutions:\n\n- The LevelDBProvider which stores all data into a LevelDB\n  - This is the recommended approach\n- The LocalGzipProvider that gzips and stores the files individually on disk\n  - Please note that file systems usually don't like thousands of files in one folder\n  - Use with care or for single site visits\n\n### Remote storage\n\nWhen running in the cloud, saving records to disk is not a reasonable thing to do.\nSo we offer a remote StorageProviders for S3 (See [#823](https://github.com/openwpm/OpenWPM/issues/823)) and GCP.\nCurrently, all remote StorageProviders write to the respective object storage service (S3/GCS).\nThe structured providers use the Parquet format.\n\n**NOTE:** The Parquet and SQL schemas should be kept in sync except\noutput-specific columns (e.g., `instance_id` in the Parquet output). You can compare\nthe two schemas by running\n`diff -y openwpm/DataAggregator/schema.sql openwpm/DataAggregator/parquet_schema.py`.\n\n## Docker Deployment for OpenWPM\n\nOpenWPM can be run in a Docker container. This is similar to running OpenWPM in\na virtual machine, only with less overhead.\n\n### Building the Docker Container\n\n**Step 1:** install Docker on your system. Most Linux distributions have Docker\nin their repositories. It can also be installed from\n[docker.com](https://www.docker.com/). For Ubuntu you can use:\n`sudo apt-get install docker.io`\n\nYou can test the installation with: `sudo docker run hello-world`\n\n_Note,_ in order to run Docker without root privileges, add your user to the\n`docker` group (`sudo usermod -a -G docker $USER`). You will have to\nlogout-login for the change to take effect, and possibly also restart the\nDocker service.\n\n**Step 2:** to build the image, run the following command from a terminal\nwithin the root OpenWPM directory:\n\n```bash\n    docker build -f Dockerfile -t openwpm .\n```\n\nAfter a few minutes, the container is ready to use.\n\n### Running Measurements from inside the Container\n\nYou can run the demo measurement from inside the container, as follows:\n\nFirst of all, you need to give the container permissions on your local\nX-server. You can do this by running: `xhost +local:docker`\n\nThen you can run the demo script using:\n\n```bash\n    mkdir -p docker-volume \u0026\u0026 docker run -v $PWD/docker-volume:/opt/OpenWPM/datadir \\\n    -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --shm-size=2g \\\n    -it --init openwpm\n```\n\n**Note:** the `--shm-size=2g` parameter is required, as it increases the\namount of shared memory available to Firefox. Without this parameter you can\nexpect Firefox to crash on 20-30% of sites.\n\nThis command uses _bind-mounts_ to share scripts and output between the\ncontainer and host, as explained below (note the paths in the command assume\nit's being run from the root OpenWPM directory):\n\n- `run` starts the `openwpm` container and executes the\n    `python /opt/OpenWPM/demo.py` command.\n\n- `-v` binds a directory on the host (`$PWD/docker-volume`) to a\n    directory in the container (`/opt/OpenWPM/datadir`). Binding allows the script's\n    output to be saved on the host (`./docker-volume`), and also allows\n    you to pass inputs to the docker container (if necessary). We first create\n    the `docker-volume` direction (if it doesn't exist), as docker will\n    otherwise create it with root permissions.\n\n- The `-it` option states the command is to be run interactively (use\n    `-d` for detached mode).\n\n- The demo scripts runs instances of Firefox that are not headless. As such,\n    this command requires a connection to the host display server. If you are\n    running headless crawls you can remove the following options:\n    `-e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix`.\n\nAlternatively, it is possible to run jobs as the user _openwpm_ in the container\ntoo, but this might cause problems with none headless browers. It is therefore\nonly recommended for headless crawls.\n\n### MacOS GUI applications in Docker\n\n**Requirements**: Install XQuartz by following [these instructions](https://stackoverflow.com/a/47309184).\n\nGiven properly installed prerequisites (including a reboot), the helper script\n`run-on-osx-via-docker.sh` in the project root folder can be used to facilitate\nworking with Docker in Mac OSX.\n\nTo open a bash session within the environment:\n\n```bash\n./run-on-osx-via-docker.sh /bin/bash\n```\n\nOr, run commands directly:\n\n```bash\n./run-on-osx-via-docker.sh python demo.py\n./run-on-osx-via-docker.sh python -m test.manual_test\n./run-on-osx-via-docker.sh python -m pytest\n./run-on-osx-via-docker.sh python -m pytest -vv -s\n```\n\n## Citation\n\nIf you use OpenWPM in your research, please cite our CCS 2016 [publication](http://randomwalker.info/publications/OpenWPM_1_million_site_tracking_measurement.pdf)\non the infrastructure. You can use the following BibTeX.\n\n```bibtex\n@inproceedings{englehardt2016census,\n    author    = \"Steven Englehardt and Arvind Narayanan\",\n    title     = \"{Online tracking: A 1-million-site measurement and analysis}\",\n    booktitle = {Proceedings of ACM CCS 2016},\n    year      = \"2016\",\n}\n```\n\nOpenWPM has been used in over [75 studies](https://github.com/openwpm/studies/blob/main/studies.md).\n\n## License\n\nOpenWPM is licensed under GNU GPLv3. Additional code has been included from\n[FourthParty](https://github.com/fourthparty/fourthparty) and\n[Privacy Badger](https://github.com/EFForg/privacybadgerfirefox), both of which\nare licensed GPLv3+.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenwpm%2Fopenwpm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenwpm%2Fopenwpm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenwpm%2Fopenwpm/lists"}