https://github.com/sodascience/generative_metadata_format
Generative metadata format for generating synthetic tabular datasets
https://github.com/sodascience/generative_metadata_format
standard statistical-metadata synthetic-data
Last synced: about 1 year ago
JSON representation
Generative metadata format for generating synthetic tabular datasets
- Host: GitHub
- URL: https://github.com/sodascience/generative_metadata_format
- Owner: sodascience
- License: mit
- Created: 2022-08-05T13:01:33.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-12-19T10:08:48.000Z (over 1 year ago)
- Last Synced: 2025-01-25T12:09:36.649Z (over 1 year ago)
- Topics: standard, statistical-metadata, synthetic-data
- Homepage:
- Size: 90.8 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Generative Metadata Format
This repository contains the JSON schema for the Generative Metadata Format.
The GMF standard can be used to store statistical metadata on tabular datasets, which can be used
to generate synthetic versions of the original dataset. By storing the statistical metadata information in
this intermediate standard, privacy can be ensured by careful manual disclosure control.
A Python reference implementation for generating GMF files and creating synthetic dataset is available on [GitHub](https://github.com/sodascience/metasynth).
The GMF standard is designed to be modular and extensible with more distributions and privacy enhancing mechanisms. Additional distributions might be available at a later date in this repository outside of the core directory.
# GMF format versions and metasyn versions
| GMF version | metasyn version compatibility |
| ----------- | ----------------------------- |
| 0.1 | metasyn < 0.4.0 |
| 0.2 | metasyn == 0.4.0 |
| 0.3 | metasyn == 0.5.0, 0.6.0 |
| 1.0 | metasyn >= 0.7.0, <= 1.0.3 |
| 1.1 | metasyn >= 1.1.0 |
## Contact
**Metasyn** (and the GMF schema attached to it) is a project by the [ODISSEI Social Data Science (SoDa)](https://odissei-data.nl/nl/soda/) team.
Do you have questions, suggestions, or remarks on the technical implementation? File an issue in the issue tracker or feel free to contact [Erik-Jan van Kesteren](https://github.com/vankesteren) or [Raoul Schram](https://github.com/qubixes).