{"id":13569341,"url":"https://github.com/uber-archive/sql-differential-privacy","last_synced_at":"2025-04-04T05:32:03.963Z","repository":{"id":57729266,"uuid":"94485471","full_name":"uber-archive/sql-differential-privacy","owner":"uber-archive","description":"Dataflow analysis \u0026 differential privacy for SQL queries. This project is deprecated and not maintained.","archived":true,"fork":false,"pushed_at":"2019-12-03T07:09:43.000Z","size":228,"stargazers_count":392,"open_issues_count":5,"forks_count":73,"subscribers_count":31,"default_branch":"master","last_synced_at":"2024-05-22T07:49:07.447Z","etag":null,"topics":["sql"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/uber-archive.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-06-15T23:06:40.000Z","updated_at":"2024-04-15T14:17:10.000Z","dependencies_parsed_at":"2022-09-10T22:02:03.026Z","dependency_job_id":null,"html_url":"https://github.com/uber-archive/sql-differential-privacy","commit_stats":null,"previous_names":["uber/sql-differential-privacy"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber-archive%2Fsql-differential-privacy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber-archive%2Fsql-differential-privacy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber-archive%2Fsql-differential-privacy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber-archive%2Fsql-differential-privacy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/uber-archive","download_url":"https://codeload.github.com/uber-archive/sql-differential-privacy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247128702,"owners_count":20888232,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["sql"],"created_at":"2024-08-01T14:00:38.775Z","updated_at":"2025-04-04T05:32:03.513Z","avatar_url":"https://github.com/uber-archive.png","language":"Scala","funding_links":[],"categories":["Awesome Privacy Engineering [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)"],"sub_categories":["Differential Privacy and Federated Learning"],"readme":"# Overview\n\n(This project is deprecated and not maintained.)\n\nThis repository contains a query analysis and rewriting framework to enforce differential privacy for general-purpose\nSQL queries. The rewriting engine can automatically transform an input query into an *intrinsically private query* which\nembeds a differential privacy mechanism in the query directly; the transformed query enforces differential privacy on\nits results and can be executed on any standard SQL database. This approach supports many state-of-the-art\ndifferential privacy mechanisms; the code currently includes rewriters based on [Elastic Sensitivity](https://arxiv.org/abs/1706.09479) and\n[Sample and Aggregate](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.2379\u0026rep=rep1\u0026type=pdf), and more will be added soon.\n\nThe rewriting framework is built on a robust dataflow analyses engine for SQL queries. This framework\nprovides an abstract representation of queries, plus several kinds of built-in dataflow analyses tailored to this\nrepresentation. This framework can be used to implement other types of dataflow analyses, as described below.\n\n## Building \u0026 Running\n\nThis framework is written in Scala and built using Maven. The code has been tested on Mac OS X and Linux. To build the code:\n\n```\n$ mvn package\n```\n\n## Example: Query Rewriting\n\nThe file `examples/QueryRewritingExample.scala` contains sample code for query rewriting and demonstrates the supported\nmechanisms using a few simple queries. To run this example:\n```\nmvn exec:java -Dexec.mainClass=\"examples.QueryRewritingExample\"\n```\n\nThis example code can be easily modified, e.g., to test different queries or change parameter values.\n\n## Background: Elastic Sensitivity\n\nElastic sensitivity is an approach for efficiently approximating the local sensitivity of a query, which can be used to\nenforce differential privacy for the query. The approach requires only a static analysis of the query and therefore\nimposes minimal performance overhead. Importantly, it does not require any changes to the database.\nDetails of the approach are available in [this paper](https://arxiv.org/abs/1706.09479).\n\nElastic sensitivity can be used to determine the scale of random noise necessary to make the results of a query\ndifferentially private. For a given output column of a query with elastic sensitivity *s*, to achieve\ndifferential privacy for that column it suffices to *smooth* *s* according to the smooth sensitivity approach to obtain\n*S*, then add random noise drawn from the Laplace distribution, scaled to *(S/epsilon)* and centered at 0, to the true\nresult of the query. The smoothing can be accomplished using the smooth sensitivity approach introduced by [Nissim et al](http://www.cse.psu.edu/~ads22/pubs/NRS07/NRS07-full-draft-v1.pdf).\n\nThe file `examples.ElasticSensitivityExample` contains code demonstrating this approach directly (i.e., applying noise manually rather than generating an intrinsically private query).\n\nTo run this example:\n```\nmvn exec:java -Dexec.mainClass=\"examples.ElasticSensitivityExample\"\n```\n\n \n## Analysis Framework\n\nThis framework can perform additional analyses on SQL queries, and can be extended with new analyses.\nEach analysis in this framework extends the base class `com.uber.engsec.dp.sql.AbstractAnalysis`. \n\nTo run an analysis on a query, call the method `com.uber.engsec.dp.sql.AbstractAnalysis.analyzeQuery`.\nThe parameter of this method is a string containing a SQL query, and its return value is an abstract domain representing\nthe results of the analysis.\n\nThe source code includes several example analyses to demonstrate features of the framework. The simplest example is `com.uber.engsec.dp.analysis.taint.TaintAnalysis`, which returns an abstract domain containing information about which output columns of the query might contain data flowing from \"tainted\" columns in the database. The database schema determines which columns are tainted. You can invoke this analysis as follows:\n\n```scala\nscala\u003e (new com.uber.engsec.dp.analysis.taint.TaintAnalysis).analyzeQuery(\"SELECT my_col1 FROM my_table\")\nBooleanDomain = my_col1 -\u003e False\n```\n\nThis code includes several built-in analyses, including:\n\n  - The elastic sensitivity analysis, available in `com.uber.engsec.dp.analysis.differential_privacy.ElasticSensitivityAnalysis`, returns an abstract domain (`com.uber.engsec.dp.analysis.differential_privacy.SensitivityDomain`) that maps each output column of the query to its elastic sensitivity.\n  - `com.uber.engsec.dp.analysis.columns_used.ColumnsUsedAnalysis` lists the original database columns\n  from which the results of each output column are computed.\n  - `com.uber.engsec.dp.analysis.histogram.HistogramAnalysis` lists the aggregation-ness of each\n  output column of the query (i.e. whether or not the output is an aggregation, and if so, which type).\n  - `com.uber.engsec.dp.analysis.join.JoinKeysUsed` lists the original database columns used as equijoin\n  keys for each output column of the query.\n\n## Writing New Analyses\n\nNew analyses can be implemented by extending one of the abstract analysis classes and implementing *transfer functions*\nwhich describe how to update the analysis state for relevant query constructs. Analyses are written to update a\nspecific type of *abstract domain* which represents the current state of the analysis. Each abstract domain type\nimplements the trait `com.uber.engsec.dp.dataflow.AbstractDomain`.\n\nThe simplest way to implement a new analysis is to use `com.uber.engsec.dp.dataflow.dp.column.AbstractColumnAnalysis`,\nwhich automatically tracks analysis state for each column of the query independently. Most of the example analyses are\nof this type.\n\nNew analyses can be invoked in the same way as the built-in example analyses.\n\n## Reporting Security Bugs\n\nPlease report security bugs through [HackerOne](https://hackerone.com/uber).\n\n## License\n\nThis project is released under the MIT License.\n\n## Contact Information\n\nThis project is developed and maintained by [Noah Johnson](mailto:noahj@berkeley.edu) and [Joe Near](mailto:jnear@berkeley.edu).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuber-archive%2Fsql-differential-privacy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fuber-archive%2Fsql-differential-privacy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuber-archive%2Fsql-differential-privacy/lists"}