{"id":28458803,"url":"https://github.com/openmined/pipelinedp","last_synced_at":"2025-07-02T09:31:50.346Z","repository":{"id":37400258,"uuid":"337809546","full_name":"OpenMined/PipelineDP","owner":"OpenMined","description":"PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.","archived":false,"fork":false,"pushed_at":"2025-06-02T19:06:40.000Z","size":2888,"stargazers_count":278,"open_issues_count":35,"forks_count":82,"subscribers_count":19,"default_branch":"main","last_synced_at":"2025-06-07T00:40:28.884Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://pipelinedp.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenMined.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"contributing/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"openmined","open_collective":"openmined"}},"created_at":"2021-02-10T18:04:22.000Z","updated_at":"2025-06-02T19:16:42.000Z","dependencies_parsed_at":"2024-01-15T17:40:22.878Z","dependency_job_id":"897f4258-51b7-4fa6-b183-323d03e07a63","html_url":"https://github.com/OpenMined/PipelineDP","commit_stats":{"total_commits":394,"total_committers":36,"mean_commits":"10.944444444444445","dds":0.5,"last_synced_commit":"52e210bbce0504d062ae3cd64e4145a1237a309b"},"previous_names":[],"tags_count":26,"template":false,"template_full_name":null,"purl":"pkg:github/OpenMined/PipelineDP","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2FPipelineDP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2FPipelineDP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2FPipelineDP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2FPipelineDP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenMined","download_url":"https://codeload.github.com/OpenMined/PipelineDP/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2FPipelineDP/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263111474,"owners_count":23415464,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-07T00:39:50.174Z","updated_at":"2025-07-02T09:31:50.332Z","avatar_url":"https://github.com/OpenMined.png","language":"Python","funding_links":["https://github.com/sponsors/openmined","https://opencollective.com/openmined"],"categories":[],"sub_categories":[],"readme":"# PipelineDP\n\nPipelineDP is a framework for applying differentially private aggregations to large\ndatasets using batch processing systems such as Apache Spark, Apache Beam,\nand more.\n\nTo make differential privacy accessible to non-experts, PipelineDP:\n\n* Provides a convenient API familiar to Spark or Beam developers.\n* Encapsulates the complexities of differential privacy, such as:\n  * protecting outliers and rare categories,\n  * generating safe noise,\n  * privacy budget accounting.\n* Supports many standard computations, such as count, sum, and average. \n\nAdditional information can be found at [pipelinedp.io](https://pipelinedp.io).\n\n*Note* that this project is still experimental and is subject to change.\nAt the moment we don't recommend its usage in production systems as it's not\nthoroughly tested yet. You can learn more in the\n[Roadmap section](https://pipelinedp.io/overview/#roadmap).\n\nThe project is a collaboration between OpenMined and Google in an effort \nto bring Differential Privacy to production.\n\n## Getting started\n\nHere are some examples of how to use PipelineDP:\n\n* [Apache Spark example](https://github.com/OpenMined/PipelineDP/blob/main/examples/movie_view_ratings/run_on_spark.py)\n* [Apache Beam example](https://github.com/OpenMined/PipelineDP/blob/main/examples/movie_view_ratings/run_on_beam.py)\n* [Framework-free example](https://github.com/OpenMined/PipelineDP/blob/main/examples/movie_view_ratings/run_without_frameworks.py)\n* [Example with all frameworks](https://github.com/OpenMined/PipelineDP/blob/main/examples/movie_view_ratings/run_all_frameworks.py)\n\nPlease check out the [codelab](https://github.com/OpenMined/PipelineDP/blob/main/examples/restaurant_visits.ipynb) for a more detailed demonstration of the API functionality and usage.\n\nCode sample showing private processing on Spark:\n```python\n# Define the privacy budget available for our computation.\nbudget_accountant = pipeline_dp.NaiveBudgetAccountant(total_epsilon=1,\n                                                      total_delta=1e-6)\n\n# Wrap Spark's RDD into its private version. You will use this private wrapper\n# for all further processing instead of the Spark's RDD. Using the wrapper ensures\n# that only private statistics can be released.\nprivate_movie_views = \\\n    make_private(movie_views, budget_accountant, lambda mv: mv.user_id)\n\n# Calculate the private sum of ratings per movie\ndp_result = private_movie_views.sum(\n    SumParams(\n              # The aggregation key: we're grouping data by movies\n              partition_extractor=lambda mv: mv.movie_id,\n              # The value we're aggregating: we're summing up ratings\n              value_extractor=lambda mv: mv.rating,\n\n              # Limits to how much one user can contribute:\n              # .. at most two movies rated per user\n              #    (if there's more, randomly choose two)\n              max_partitions_contributed=2,\n              # .. at most one ratings for each movie\n              max_contributions_per_partition=1,\n              # .. with minimal rating of \"1\"\n              #    (automatically clip the lesser values to \"1\")\n              min_value=1,\n              # .. and maximum rating of \"5\"\n              #    (automatically clip the greater values to \"5\")\n              max_value=5)\n              )\nbudget_accountant.compute_budgets()\n\n# Save the results\ndp_result.saveAsTextFile(FLAGS.output_file)\n```\n\n## Installation\n\nPipelineDP without any frameworks:\n\n`pip install pipeline-dp`\n\nIf you like to run PipelineDP on Apache Spark:\n\n`pip install pipeline-dp pyspark`\n\non Apache Beam:\n\n`pip install pipeline-dp apache-beam`.\n\nSupported Python version \u003e= 3.8.\n\n**Note for Apple Silicon users:** PipelineDP pip package is currently available only \nfor x86 architecture. The reason is that [PyDP](https://github.com/OpenMined/PyDP) does not\nhave pip pacakge. It might be possible to compile it from sources for Apple Silicon.\n\n## Attack Model\n\nPipelineDP has the same [attack model](https://github.com/google/differential-privacy/blob/main/common_docs/attack_model.md)\nas the Google Differential Privacy Library.\n \n## Development\n\nTo setup a local environment and contribute with the development of PipelineDP, please see our guidelines in [CONTRIBUTING](https://github.com/OpenMined/PipelineDP/blob/main/contributing/CONTRIBUTING.md).\n\n## Support and Community on Slack\n\nIf you have questions about the PipelineDP, join\n[OpenMined's Slack](https://slack.openmined.org) and check the\n**#differential-privacy** channel.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmined%2Fpipelinedp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenmined%2Fpipelinedp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmined%2Fpipelinedp/lists"}