{"id":21685874,"url":"https://github.com/sodascience/workshop_syntheticdata_osf2022","last_synced_at":"2026-01-04T05:42:16.088Z","repository":{"id":114448486,"uuid":"555248145","full_name":"sodascience/workshop_syntheticdata_osf2022","owner":"sodascience","description":"Files for the synthetic data presentation at the Open Science Festival 2022","archived":false,"fork":false,"pushed_at":"2024-02-05T11:08:24.000Z","size":2389,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-25T12:09:36.482Z","etag":null,"topics":["open-science","privacy-protection","synthetic-data-generation"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sodascience.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-10-21T08:01:49.000Z","updated_at":"2024-04-12T12:51:13.000Z","dependencies_parsed_at":"2023-06-08T04:15:56.648Z","dependency_job_id":null,"html_url":"https://github.com/sodascience/workshop_syntheticdata_osf2022","commit_stats":null,"previous_names":["sodascience/workshop_syntheticdata_osf2022"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fworkshop_syntheticdata_osf2022","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fworkshop_syntheticdata_osf2022/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fworkshop_syntheticdata_osf2022/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fworkshop_syntheticdata_osf2022/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sodascience","download_url":"https://codeload.github.com/sodascience/workshop_syntheticdata_osf2022/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244609359,"owners_count":20480780,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["open-science","privacy-protection","synthetic-data-generation"],"created_at":"2024-11-25T16:23:30.213Z","updated_at":"2026-01-04T05:42:16.059Z","avatar_url":"https://github.com/sodascience.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# How to create synthetic data: a tool for open science\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7234121.svg)](https://doi.org/10.5281/zenodo.7234121)\n\nArchive for the synthetic data pre-conference workshop at the [Open Science Festival](https://opensciencefestival.nl) on September 1, 2022.\n\n\u003cdiv\u003e\n\u003cimg src=\"img/osf2022_workshop.jpg\" alt=\"SoDa logo\" width=\"400px\"/\u003e \n\n_photo by Pim Rusch (c)_\n\u003c/div\u003e\n\n# Contents of this archive\nThe original workshop proposal submitted to the conference can be found in the [`00_proposal`](./00_proposal/) folder.\n\nThe workshop was held in four parts:\n\n- [`01_introduction`](./01_introduction/) - An introduction to synthetic data \u0026 the privacy-utility tradeoff. \n- [`02_metasynth`](./02_metasynth/) - Hands-on high-privacy synthetic data generation using the [metasynth](https://github.com/sodascience/metasynth) package.\n- [`03_synthpop`](./03_synthpop/) - Hands-on high-utility synthetic data generation using the [synthpop](https://synthpop.org.uk) package.\n- [`04_closing`](./04_closing/) - A short closing / conclusion.\n\n# Abstract\nOpen data is one of the pillars of open science. However, there are often barriers in the way of making research data openly available, relating to consent, privacy, or organisational boundaries. In such cases, synthetic data is an excellent solution: the real data is kept secret, but a \"fake\" version of the data is available. The promise of the synthetic dataset is that others can then investigate the data structure, rerun scripts, use the data in educational materials, or even run a completely different analysis on their own.\n\nBut how do you generate synthetic data? In this session, we will introduce the field of synthetic data generation and apply several tools to generate synthetic versions of datasets, with various level of utility and privacy. We will be paying extra attention to practical issues such as missing values, data types, and disclosure control. Participants can either use a provided example dataset or they can bring their own data!\n\n# Contact\n\n\u003cimg src=\"img/soda.png\" alt=\"SoDa logo\" width=\"250px\"/\u003e \n\nThis workshop was a project by the [ODISSEI Social Data Science (SoDa) team](https://odissei-soda.nl).\n\nDo you have questions, suggestions, or remarks on the technical implementation? File an issue in the issue tracker or feel free to contact [Erik-Jan van Kesteren](https://github.com/vankesteren), [Raoul Schram](https://github.com/qubixes), or [Thom Volker](https://github.com/thomvolker).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsodascience%2Fworkshop_syntheticdata_osf2022","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsodascience%2Fworkshop_syntheticdata_osf2022","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsodascience%2Fworkshop_syntheticdata_osf2022/lists"}