{"id":13693097,"url":"https://github.com/dmey/synthia","last_synced_at":"2026-02-01T00:13:05.262Z","repository":{"id":44932655,"uuid":"244198482","full_name":"dmey/synthia","owner":"dmey","description":"📈 🐍 Multidimensional synthetic data generation with Copula and fPCA models in Python","archived":false,"fork":false,"pushed_at":"2023-09-28T23:55:55.000Z","size":20688,"stargazers_count":64,"open_issues_count":2,"forks_count":10,"subscribers_count":3,"default_branch":"master","last_synced_at":"2026-01-11T16:00:35.448Z","etag":null,"topics":["augmentation","climate","copula","data-augmentation","data-generation","data-generator","data-modelling","data-science","dependency-analysis","dependency-modeling","finance","fpca","functional-data","machine-learning","oversampling","principal-component-analysis","statistics","synthetic-data","weather","xarray"],"latest_commit_sha":null,"homepage":"https://dmey.github.io/synthia","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dmey.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.txt","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":".zenodo.json"}},"created_at":"2020-03-01T18:04:55.000Z","updated_at":"2025-05-21T23:44:43.000Z","dependencies_parsed_at":"2024-01-06T13:09:33.922Z","dependency_job_id":"0a378628-df42-45f1-ac5c-04bf6959ea50","html_url":"https://github.com/dmey/synthia","commit_stats":{"total_commits":59,"total_committers":4,"mean_commits":14.75,"dds":"0.11864406779661019","last_synced_commit":"84382d28fabc3f818c496624b2a39515e31cc530"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/dmey/synthia","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmey%2Fsynthia","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmey%2Fsynthia/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmey%2Fsynthia/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmey%2Fsynthia/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dmey","download_url":"https://codeload.github.com/dmey/synthia/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmey%2Fsynthia/sbom","scorecard":{"id":347765,"data":{"date":"2025-08-11","repo":{"name":"github.com/dmey/synthia","commit":"84382d28fabc3f818c496624b2a39515e31cc530"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.9,"checks":[{"name":"Code-Review","score":1,"reason":"Found 3/30 approved changesets -- score normalized to 1","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/ci.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/ci.yml:16: update your workflow using https://app.stepsecurity.io/secureworkflow/dmey/synthia/ci.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:17: update your workflow using https://app.stepsecurity.io/secureworkflow/dmey/synthia/ci.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/ci.yml:44: update your workflow using https://app.stepsecurity.io/secureworkflow/dmey/synthia/ci.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:46: update your workflow using https://app.stepsecurity.io/secureworkflow/dmey/synthia/ci.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:68: update your workflow using https://app.stepsecurity.io/secureworkflow/dmey/synthia/ci.yml/master?enable=pin","Info:   0 out of   2 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   3 third-party GitHubAction dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.txt:0","Info: FSF or OSI recognized license: MIT License: LICENSE.txt:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 9 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-18T07:32:51.886Z","repository_id":44932655,"created_at":"2025-08-18T07:32:51.886Z","updated_at":"2025-08-18T07:32:51.886Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28480513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T11:59:17.896Z","status":"ssl_error","status_checked_at":"2026-01-16T11:55:55.838Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["augmentation","climate","copula","data-augmentation","data-generation","data-generator","data-modelling","data-science","dependency-analysis","dependency-modeling","finance","fpca","functional-data","machine-learning","oversampling","principal-component-analysis","statistics","synthetic-data","weather","xarray"],"created_at":"2024-08-02T17:01:05.519Z","updated_at":"2026-02-01T00:13:05.232Z","avatar_url":"https://github.com/dmey.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"assets/img/logo.png\" alt=\"synthia\" height=\"120\"\u003e\u003cbr\u003e\u003cbr\u003e\n\n  [![PyPI](https://img.shields.io/pypi/v/synthia)](https://pypi.org/project/synthia) [![CI](https://github.com/dmey/synthia/workflows/CI/badge.svg)](https://github.com/dmey/synthia/actions) [![DOI](https://joss.theoj.org/papers/10.21105/joss.02863/status.svg)](https://doi.org/10.21105/joss.02863)\n\n  [Overview](#overview) | [Documentation](#documentation) | [How to cite](#how-to-cite) | [Contributing](#contributing) | [Development notes](#development-notes) | [Copyright and license](#copyright-and-license) | [Acknowledgements](#acknowledgements)\n\u003c/div\u003e\n\n## Overview\n\nSynthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences. [Copula](https://dmey.github.io/synthia/copula.html) and [functional Principle Component Analysis (fPCA)](https://dmey.github.io/synthia/fpca.html) are statistical models that allow these properties to be simulated ([Joe 2014](https://doi.org/10.1201/b17116)). As such, copula generated data have shown potential to improve the generalization of machine learning (ML) emulators ([Meyer et al. 2021](https://doi.org/10.5194/gmd-14-5205-2021)) or anonymize real-data datasets ([Patki et al. 2016](https://doi.org/10.1109/DSAA.2016.49)).\n\nSynthia is an open source Python package to model univariate and multivariate data, parameterize data using empirical and parametric methods, and manipulate marginal distributions. It is designed to enable scientists and practitioners to handle labelled multivariate data typical of computational sciences. For example, given some vertical profiles of atmospheric temperature, we can use Synthia to generate new but statistically similar profiles in just three lines of code (Table 1).\n\nSynthia supports three methods of multivariate data generation through: (i) fPCA, (ii) parametric (Gaussian) copula, and (iii) vine copula models for continuous (all), discrete (vine), and categorical (vine) variables. It has a simple and succinct API to natively handle [xarray](https://xarray.pydata.org)'s labelled arrays and datasets. It uses a pure Python implementation for fPCA and Gaussian copula, and relies on the fast and well tested C++ library [vinecopulib](https://github.com/vinecopulib/vinecopulib) through [pyvinecopulib](https://github.com/vinecopulib/pyvinecopulib)'s bindings for fast and efficient computation of vines. For more information, please see the website at https://dmey.github.io/synthia.\n\n\n**Table 1**. *Example application of Gaussian and fPCA classes in Synthia. These are used to generate random profiles of atmospheric temperature similar to those included in the source data. The xarray dataset structure is maintained and returned by Synthia.*\n\n| Source                                       | Synthetic with Gaussian Copula                           | Synthetic with fPCA                              |\n| -------------------------------------------- | -------------------------------------------------------- | ------------------------------------------------ |\n| `ds = syn.util.load_dataset()`               | `g = syn.CopulaDataGenerator()`                          | `g = syn.fPCADataGenerator()`                    |\n|                                              | `g.fit(ds, syn.GaussianCopula())`                        | `g.fit(ds)`                                      |\n|                                              | `g.generate(n_samples=500)`                              | `g.generate(n_samples=500)`                      |\n|                                              |                                                          |                                                  |\n| ![Source](./assets/img/temperature_true.png) | ![Gaussian](./assets/img/temperature_synth_gaussian.png) | ![fPCA](./assets/img/temperature_synth_fPCA.png) |\n\n\n## Documentation\n\nFor installation instructions, getting started guides and tutorials, background information, and API reference summaries, please see the [website](https://dmey.github.io/synthia).\n\n\n## How to cite\n\nIf you are using Synthia, please cite the following two papers using their respective Digital Object Identifiers (DOIs). Citations may be generated automatically using Crosscite's [DOI Citation Formatter](https://citation.crosscite.org/) or from the BibTeX entries below.\n\n| Synthia Software                                                | Software Application                                                      |\n| --------------------------------------------------------------- | ------------------------------------------------------------------------- |\n| DOI: [10.21105/joss.02863](https://doi.org/10.21105/joss.02863) | DOI: [10.5194/gmd-14-5205-2021](https://doi.org/10.5194/gmd-14-5205-2021) |\n\n```bibtex\n@article{Meyer_and_Nagler_2021,\n  doi = {10.21105/joss.02863},\n  url = {https://doi.org/10.21105/joss.02863},\n  year = {2021},\n  publisher = {The Open Journal},\n  volume = {6},\n  number = {65},\n  pages = {2863},\n  author = {David Meyer and Thomas Nagler},\n  title = {Synthia: multidimensional synthetic data generation in Python},\n  journal = {Journal of Open Source Software}\n}\n\n@article{Meyer_and_Nagler_and_Hogan_2021,\n  doi = {10.5194/gmd-14-5205-2021},\n  url = {https://doi.org/10.5194/gmd-14-5205-2021},\n  year = {2021},\n  publisher = {Copernicus {GmbH}},\n  volume = {14},\n  number = {8},\n  pages = {5205--5215},\n  author = {David Meyer and Thomas Nagler and Robin J. Hogan},\n  title = {Copula-based synthetic data augmentation for machine-learning emulators},\n  journal = {Geoscientific Model Development}\n}\n```\n\nIf needed, you may also cite the specific software version with [its corresponding Zendo DOI](https://doi.org/10.5281/zenodo.4701278). \n\n## Contributing\n\nIf you are looking to contribute, please read our [Contributors' guide](CONTRIBUTING.md) for details.\n\n\n## Development notes\n\nIf you would like to know more about specific development guidelines, testing and deployment, please refer to our [development notes](DEVELOP.md).\n\n\n## Copyright and license\n\nCopyright 2020 D. Meyer and T. Nagler. Licensed under [MIT](LICENSE.txt).\n\n\n## Acknowledgements\n\nSpecial thanks to [@letmaik](https://github.com/letmaik) for his suggestions and contributions to the project.\n","funding_links":[],"categories":["Data-driven methods","Python"],"sub_categories":["Tabular","General-Purpose Machine Learning"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmey%2Fsynthia","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdmey%2Fsynthia","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmey%2Fsynthia/lists"}