{"id":21685866,"url":"https://github.com/sodascience/metasyn-disclosure-control","last_synced_at":"2025-10-03T19:42:22.521Z","repository":{"id":161079296,"uuid":"519144296","full_name":"sodascience/metasyn-disclosure-control","owner":"sodascience","description":"Plugin for metasyn that prevents data from leaking.","archived":false,"fork":false,"pushed_at":"2025-07-17T12:42:39.000Z","size":4406,"stargazers_count":2,"open_issues_count":4,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-08-31T20:41:20.894Z","etag":null,"topics":["disclosure-control","metasyn","plugin","privacy-protection","synthetic-data"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sodascience.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-07-29T08:45:50.000Z","updated_at":"2025-07-17T12:21:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"d981d0af-f947-4b76-9ffe-1da324156abf","html_url":"https://github.com/sodascience/metasyn-disclosure-control","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/sodascience/metasyn-disclosure-control","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fmetasyn-disclosure-control","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fmetasyn-disclosure-control/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fmetasyn-disclosure-control/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fmetasyn-disclosure-control/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sodascience","download_url":"https://codeload.github.com/sodascience/metasyn-disclosure-control/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fmetasyn-disclosure-control/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273983693,"owners_count":25202205,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-06T02:00:13.247Z","response_time":2576,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["disclosure-control","metasyn","plugin","privacy-protection","synthetic-data"],"created_at":"2024-11-25T16:23:26.882Z","updated_at":"2025-10-03T19:42:17.466Z","avatar_url":"https://github.com/sodascience.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Metasyn disclosure control\n[![](https://img.shields.io/badge/metasyn-plugin-blue?logo=python\u0026logoColor=white)](https://github.com/sodascience/metasyn)\n[![Python package](https://github.com/sodascience/metasyn-disclosure-control/actions/workflows/python-package.yml/badge.svg)](https://github.com/sodascience/metasyn-disclosure-control/actions/workflows/python-package.yml)\n[![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)\n\nA privacy plugin for [metasyn](https://github.com/sodascience/metasyn), based on statistical disclosure control (SDC) rules of thumb as found in the following documents:\n\n- The [SDC handbook](https://securedatagroup.org/guides-and-resources/sdc-handbook/) of the Secure Data group in the UK\n- The Data Without Boundaries document [Guidelines for output checking](https://wayback.archive-it.org/12090/*/https:/cros-legacy.ec.europa.eu/system/files/dwb_standalone-document_output-checking-guidelines.pdf) (pdf)\n- Statistics Netherlands' output guidelines\n\nProducing synthetic data with [metasyn](https://github.com/sodascience/metasyn) is already a great first step towards protecting privacy, but it doesn't adhere to official standards. For example, fitting a uniform distribution will disclose the lowest and highest values in the dataset, which may be a privacy issue in particularly sensitive data. This plugin solves these kinds of problems.\n\n\u003e [!WARNING]\n\u003e Currently, the disclosure control plugin is work in progress. Especially in light of this, we disclaim\nany responsibility as a result of using this plugin. \n\n## Installing the plugin\n\nTo install the package with pip, run the following:\n```sh\npip install metasyn-disclosure\n```\n\nFor the development, installed the package directly through git with the following command:\n\n ```sh\n pip install git+https://github.com/sodascience/metasyn-disclosure-control.git\n ```\n\n## Usage\n\nBasic usage for our built-in titanic dataset is as follows:\n\n```py\nfrom metasyncontrib.disclosure import DisclosurePrivacy\nfrom metasyncontrib.disclosure.string import DisclosureFaker\n\nfrom metasyn import MetaFrame, VarSpec, demo_dataframe\n\ndf = demo_dataframe(\"titanic\")\n\nspec = [\n    VarSpec(name=\"PassengerId\", unique=True),\n    VarSpec(name=\"Name\", distribution=DisclosureFaker(\"name\")),\n]\n\nmf = MetaFrame.fit_dataframe(\n    df=df,\n    var_specs=spec,\n    privacy=DisclosurePrivacy(),\n)\n\nmf.synthesize(5)\n```\n\n```\nshape: (5, 13)\n┌─────────────┬────────────────────┬────────┬──────┬───┬────────────┬────────────┬─────────────────────┬────────┐\n│ PassengerId ┆ Name               ┆ Sex    ┆ Age  ┆ … ┆ Birthday   ┆ Board time ┆ Married since       ┆ all_NA │\n│ ---         ┆ ---                ┆ ---    ┆ ---  ┆   ┆ ---        ┆ ---        ┆ ---                 ┆ ---    │\n│ i64         ┆ str                ┆ cat    ┆ i64  ┆   ┆ date       ┆ time       ┆ datetime[μs]        ┆ f32    │\n╞═════════════╪════════════════════╪════════╪══════╪═══╪════════════╪════════════╪═════════════════════╪════════╡\n│ 0           ┆ Benjamin Cox       ┆ female ┆ 27   ┆ … ┆ 1931-12-01 ┆ 14:33:06   ┆ 2022-07-30 02:16:37 ┆ null   │\n│ 1           ┆ Mr. David Robinson ┆ female ┆ null ┆ … ┆ 1906-02-18 ┆ null       ┆ 2022-08-03 13:09:19 ┆ null   │\n│ 2           ┆ Randy Mosley       ┆ male   ┆ 24   ┆ … ┆ 1933-01-06 ┆ 15:52:54   ┆ 2022-07-18 18:52:05 ┆ null   │\n│ 3           ┆ Vincent Maddox     ┆ female ┆ 24   ┆ … ┆ 1937-02-10 ┆ 16:58:30   ┆ 2022-07-23 20:29:49 ┆ null   │\n│ 4           ┆ Kristin Holland    ┆ male   ┆ 17   ┆ … ┆ 1939-12-09 ┆ 18:07:45   ┆ 2022-08-05 02:41:51 ┆ null   │\n└─────────────┴────────────────────┴────────┴──────┴───┴────────────┴────────────┴─────────────────────┴────────┘\n```\n\n\n## Implementation details\nThe rules of thumb, roughly, are: \n\n- at least 10 units\n- at least 10 degrees of freedom\n- no group disclosure\n- no dominance\n\nFor most distributions, we implemented micro-aggregation. This technique pre-averages a sorted version of the data, which then supplied to the original fitting mechanism. The idea is that during this pre-averaging step, we ensure that the rules of thumb are followed, so that the fitting method doesn't need to do anything in particular. While from a statistical point of view, we are losing more information than we probably need, it should ensure the safety of the data. \n\n\n\n\u003c!-- CONTRIBUTING --\u003e\n## Contributing\nYou can contribute to this metasyn plugin by giving feedback in the \"Issues\" tab, or by creating a pull request.\n\nTo create a pull request:\n1. Fork the Project\n2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the Branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n\n\u003c!-- CONTACT --\u003e\n## Contact\nThis is a project by the [ODISSEI Social Data Science (SoDa)](https://odissei-data.nl/nl/soda/) team. Do you have questions, suggestions, or remarks on the technical implementation? File an issue in the issue tracker or feel free to contact [Raoul Schram](https://github.com/qubixes) or [Erik-Jan van Kesteren](https://github.com/vankesteren).\n\n\u003cimg src=\"soda.png\" alt=\"SoDa logo\" width=\"250px\"/\u003e \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsodascience%2Fmetasyn-disclosure-control","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsodascience%2Fmetasyn-disclosure-control","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsodascience%2Fmetasyn-disclosure-control/lists"}