{"id":28458781,"url":"https://github.com/openmined/pyvertical","last_synced_at":"2025-07-02T09:31:23.068Z","repository":{"id":39600696,"uuid":"269736833","full_name":"OpenMined/PyVertical","owner":"OpenMined","description":"Privacy Preserving Vertical Federated Learning","archived":false,"fork":false,"pushed_at":"2023-06-01T17:06:57.000Z","size":25022,"stargazers_count":218,"open_issues_count":20,"forks_count":52,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-06-07T00:40:04.959Z","etag":null,"topics":["federated-learning","partitioned-data","private-set-intersection","psi","split-neural-network","splitnn","vertical-federated-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenMined.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null},"funding":{"github":"openmined","open_collective":"openmined"}},"created_at":"2020-06-05T18:28:01.000Z","updated_at":"2025-05-31T03:29:12.000Z","dependencies_parsed_at":"2022-09-15T21:42:02.750Z","dependency_job_id":null,"html_url":"https://github.com/OpenMined/PyVertical","commit_stats":{"total_commits":226,"total_committers":6,"mean_commits":"37.666666666666664","dds":0.584070796460177,"last_synced_commit":"90bc44bf5d7d285ceb727bf0c071a71220cf00c0"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/OpenMined/PyVertical","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2FPyVertical","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2FPyVertical/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2FPyVertical/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2FPyVertical/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenMined","download_url":"https://codeload.github.com/OpenMined/PyVertical/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2FPyVertical/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263111383,"owners_count":23415442,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["federated-learning","partitioned-data","private-set-intersection","psi","split-neural-network","splitnn","vertical-federated-learning"],"created_at":"2025-06-07T00:39:47.198Z","updated_at":"2025-07-02T09:31:23.053Z","avatar_url":"https://github.com/OpenMined.png","language":"Python","readme":"![om-logo](https://github.com/OpenMined/design-assets/blob/master/logos/OM/horizontal-primary-trans.png)\n\n![Tests](https://github.com/OpenMined/PyVertical/workflows/Tests/badge.svg?branch=master)\n![License](https://img.shields.io/github/license/OpenMined/PyVertical)\n![OpenCollective](https://img.shields.io/opencollective/all/openmined)\n\n# PyVertical\n\nA project developing privacy-preserving,\nvertical federated learning,\nusing [`syft`](syft).\n\n- :link: Private entity resolution\n         using Private Set Intersection (PSI)\n- :lock: Trains a model on vertically partitioned data\n        using SplitNNs,\n        so only data holders can access data\n\nVertically-partitioned data is data\nin which\nfields relating to a single record\nare distributed across multiple datasets.\nFor example,\nmultiple hospitals may have admissions data on the same patients,\nor retailers have transaction data on the same shoppers.\nVertically-partitioned data could be applied to solve vital problems,\nbut data holders can't combine their datasets\nby simply comparing notes with other data holders\nunless they want to break user privacy.\n`PyVertical` uses [PSI]\nto link datasets in a privacy-preserving way.\nWe train SplitNNs on the partitioned data\nto ensure the data remains separate throughout the entire process.\n\nSee the [changelog](./CHANGELOG.md)\nfor information\non the current status of `PyVertical`.\n\n**NOTE: PyVertical does not currently work with `syft 0.3.0`**\n\n## The Process\n\n![PyVertical diagram](./images/diagram_white_background.png)\n\nPyVertical process:\n1. Create partitioned dataset\n    - Simulate real-world partitioned dataset by splitting MNIST into a dataset of images and a dataset of labels\n    - Give each data point (image + label) a unique ID\n    - Randomly shuffle each dataset\n    - Randomly remove some elements from each dataset\n1. Link datasets using PSI\n    - Use **PSI** to link indices in each dataset using unique IDs\n    - Reorder datasets using linked indices\n1. Train a split neural network\n    - Hold both datasets in a dataloader\n    - Send images to first part of split network\n    - Send labels to second part of split network\n    - Train the network\n\n## Requirements\n\n### OS\n\n| Windows | Linux | MacOS |\n|:--:|:--:|:--:|\n| :x: | :heavy_check_mark: | :heavy_check_mark: |\n\nThe Windows `PyTorch version 1.4.0` is experiencing issues.\nIt cannot be updated on a working version, until `Syft` will be updated, too.\n\n### Python\n\n| `3.6` | `3.7` | `3.8` | `3.9` |\n| ------|-------|-------|-------|\n| :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |\n\n[syft] and [PSI]\nupstream dependencies do not have `Python 3.9`\npackages.\n\n### PyTorch Environment\n\nTo install the dependencies,\nwe recommend using [Conda]:\n1. Clone this repository\n1. In the command line, navigate to your local copy of the repository\n1. Run `conda env create -f environment.yml`\n    - This creates an environment `pyvertical-dev`\n    - Comes with most dependencies you will need\n1. Activate the environment with `conda activate pyvertical-dev`\n1. Run `conda install notebook`\n\nN.b. Installing the dependencies takes several steps to circumvent versioning incompatibility between\n`syft` and `jupyter`.\nIn the future,\nall packages will be moved into the `environment.yml`.\n\n### Tensorflow Environment\n\nTo install the dependencies,\nwe recommend using [Conda]:\n1. Clone this repository\n1. In the command line, navigate to your local copy of the repository\n1. Run `conda env create -f tf_environment.yml`\n    - This creates an environment `pyvertical-dev-tf`\n    - Comes with most dependencies you will need\n1. Activate the environment with `conda activate pyvertical-dev-tf`\n1. Run `conda install notebook`\n\n### Docker\n\nYou can instead opt to use Docker.\n\nTo run:\n\n1. Build the image with `docker build -t pyvertical:latest .`\n1. Launch a container with `docker run -it -p 8888:8888 pyvertical:latest`\n    - Defaults to launching jupyter lab\n\n### Synthea\n\n`PyVertical` is applying fake medical data\ngenerated by [synthea]\nto demonstrate multi-party,\nvertical federated learning.\nRead the [synthea] docs\nfor requirements to generate the data.\nWith those pre-requisites installed,\nrun the `scripts/download_synthea.sh`\nbash script\nfrom the root directory\nof this project,\nwhich generates a deterministic dataset\nand stores it in `data/synthea`.\n\n## Usage\n\nCheck out\n[`examples/PyVertical Example.ipynb`](examples/PyVertical%20Example.ipynb)\nto see `PyVertical` in action.\n\n## Goals\n\n- [X] MVP\n    - Simple example on MNIST dataset\n    - One data holder has images, the other has labels\n- [ ] Extension demonstration\n    - Apply process to electronic health records (EHR) dataset\n    - Dual-headed SplitNN: input data is split amongst several data holders\n- [ ] Integrate with [`syft`](https://www.github.com/OpenMined/PySyft)\n\n## Contributing\nPull requests are welcome.\nFor major changes,\nplease open an issue first to discuss what you would like to change.\n\nRead the OpenMined\n[contributing guidelines][contrib]\nand [styleguide](https://github.com/OpenMined/.github/blob/master/STYLEGUIDE.md)\nfor more information.\n\n## Contributors\n|  [![TTitcombe](https://github.com/TTitcombe.png?size=150)][ttitcombe] | [![Pavlos-P](https://github.com/pavlos-p.png?size=150)][pavlos-p]  | [![H4ll](https://github.com/h4ll.png?size=150)][h4ll] | [![rsandmann](https://github.com/rsandmann.png?size=150)][rsandmann] | [![daler3](https://github.com/daler3.png?size=150)][daler3]\n| :--:|:--: |:--:|:--:|:--:|\n|  [TTitcombe] | [Pavlos-p]  | [H4LL] | [rsandmann] | [daler3] \n\n## Testing\nWe use [`pytest`][pytest] to test the source code.\nTo run the tests manually:\n1. In the command line, navigate to the root of this repository\n1. Run `python -m pytest`\n\nCI also checks the code is formatting according to [contributing guidelines][contrib].\n\n## Publications\nRomanini, D., Hall, A. J., Papadopoulos, P., Titcombe, T., Ismail, A., Cebere, T., Sandmann, R., Roehm, R. \u0026 Hoeh, M. A. (2021). PyVertical: A Vertical Federated Learning Framework for Multi-headed SplitNN. arXiv preprint arXiv:2104.00489. ([link](https://arxiv.org/abs/2104.00489))\n\nAngelou, N., Benaissa, A., Cebere, B., Clark, W., Hall, A. J., Hoeh, M. A., Liu, D., Papadopoulos, P., Roehm, R., Sandmann, R., Schoppmann, P. \u0026 Titcombe, T. (2020). Asymmetric Private Set Intersection with Applications to Contact Tracing and Private Vertical Federated Machine Learning. arXiv preprint arXiv:2011.09350. ([link](https://arxiv.org/abs/2011.09350))\n\nYou can cite this work using:\n\n    @article{romanini2021pyvertical,\n        title={PyVertical: A Vertical Federated Learning Framework for Multi-headed SplitNN},\n        author={Romanini, Daniele and Hall, Adam James and Papadopoulos, Pavlos and Titcombe, Tom and Ismail, Abbas and Cebere, Tudor and Sandmann, Robert and Roehm, Robin and Hoeh, Michael A},\n        journal={arXiv preprint arXiv:2104.00489},\n        year={2021}\n    }\n\n    @article{angelou2020asymmetric,\n        title={Asymmetric Private Set Intersection with Applications to Contact Tracing and Private Vertical Federated Machine Learning},\n        author={Angelou, Nick and Benaissa, Ayoub and Cebere, Bogdan and Clark, William and Hall, Adam James and Hoeh, Michael A and Liu, Daniel and Papadopoulos, Pavlos and Roehm, Robin and Sandmann, Robert and others},\n        journal={arXiv preprint arXiv:2011.09350},\n        year={2020}\n    }\n\n\n## License\n[Apache License 2.0](https://choosealicense.com/licenses/apache-2.0/)\n\n[black]: https://black.readthedocs.io/en/stable/\n[conda]: https://docs.conda.io/en/latest/\n[contrib]: https://github.com/OpenMined/.github/blob/master/CONTRIBUTING.md\n[flake8]: https://flake8.pycqa.org/en/latest/index.html#quickstart\n[psi]: https://www.github.com/OpenMined/PSI\n[pytest]: https://docs.pytest.org/en/latest/contents.html\n[syft]: https://github.com/OpenMined/PySyft\n[synthea]: https://github.com/synthetichealth/synthea\n\n[ttitcombe]: https://github.com/ttitcombe\n[pavlos-p]: https://github.com/pavlos-p\n[h4ll]: https://github.com/h4ll\n[rsandmann]: https://github.com/rsandmann\n[daler3]: https://github.com/daler3\n","funding_links":["https://github.com/sponsors/openmined","https://opencollective.com/openmined"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmined%2Fpyvertical","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenmined%2Fpyvertical","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmined%2Fpyvertical/lists"}