{"id":17256089,"url":"https://github.com/clementchadebec/pyraug","last_synced_at":"2025-08-22T16:09:16.516Z","repository":{"id":62583037,"uuid":"374434209","full_name":"clementchadebec/pyraug","owner":"clementchadebec","description":"Data Augmentation with Variational Autoencoders (TPAMI)","archived":false,"fork":false,"pushed_at":"2022-09-17T10:22:56.000Z","size":10849,"stargazers_count":140,"open_issues_count":2,"forks_count":14,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-07-03T00:51:21.899Z","etag":null,"topics":["data-augmentation","python","variational-autoencoder"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/clementchadebec.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-06-06T18:24:26.000Z","updated_at":"2025-04-02T06:45:25.000Z","dependencies_parsed_at":"2022-11-03T21:34:33.069Z","dependency_job_id":null,"html_url":"https://github.com/clementchadebec/pyraug","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/clementchadebec/pyraug","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clementchadebec%2Fpyraug","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clementchadebec%2Fpyraug/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clementchadebec%2Fpyraug/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clementchadebec%2Fpyraug/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/clementchadebec","download_url":"https://codeload.github.com/clementchadebec/pyraug/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clementchadebec%2Fpyraug/sbom","scorecard":{"id":291261,"data":{"date":"2025-08-11","repo":{"name":"github.com/clementchadebec/pyraug","commit":"575f12454e271b248af042c8304fc1a4e14c79e2"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":1.7,"checks":[{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'main'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":0,"reason":"20 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-3f63-hfp8-52jq","Warn: Project is vulnerable to: GHSA-44wm-f244-xhp3","Warn: Project is vulnerable to: GHSA-4fx9-vc88-q2xc","Warn: Project is vulnerable to: PYSEC-2023-227 / GHSA-8ghj-p4vj-mr35","Warn: Project is vulnerable to: PYSEC-2022-10 / GHSA-8vj2-vxx3-667w","Warn: Project is vulnerable to: PYSEC-2022-168 / GHSA-9j59-75qj-795w","Warn: Project is vulnerable to: GHSA-j7hp-h8jx-5ppr","Warn: Project is vulnerable to: PYSEC-2022-42979 / GHSA-m2vv-5vj5-2hm7","Warn: Project is vulnerable to: PYSEC-2022-8 / GHSA-pw3c-h7wp-cvhx","Warn: Project is vulnerable to: PYSEC-2022-9 / GHSA-xrcv-f9gm-v42c","Warn: Project is vulnerable to: PYSEC-2023-175","Warn: Project is vulnerable to: GHSA-mr82-8j83-vxmv","Warn: Project is vulnerable to: GHSA-3749-ghw9-m3mg","Warn: Project is vulnerable to: PYSEC-2022-43015 / GHSA-47fc-vmwq-366v","Warn: Project is vulnerable to: PYSEC-2025-41 / GHSA-53q9-r3pm-6pq6","Warn: Project is vulnerable to: PYSEC-2024-252 / GHSA-5pcm-hx3q-hm94","Warn: Project is vulnerable to: GHSA-887c-mr87-cxwp","Warn: Project is vulnerable to: PYSEC-2024-251 / GHSA-pg7h-5qx3-wjr3","Warn: Project is vulnerable to: PYSEC-2024-250","Warn: Project is vulnerable to: PYSEC-2024-259"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-17T18:16:00.496Z","repository_id":62583037,"created_at":"2025-08-17T18:16:00.496Z","updated_at":"2025-08-17T18:16:00.496Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271665163,"owners_count":24799302,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-22T02:00:08.480Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-augmentation","python","variational-autoencoder"],"created_at":"2024-10-15T07:13:33.878Z","updated_at":"2025-08-22T16:09:16.487Z","avatar_url":"https://github.com/clementchadebec.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/clementchadebec/pyraug/main/docs/source/imgs/logo_pyraug_2.jpeg\" width=\"400\"/\u003e\n    \u003cbr\u003e\n\u003cp\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href='https://pypi.org/project/pyraug/'\u003e\n\t    \u003cimg src='https://badge.fury.io/py/pyraug.svg' /\u003e\n\t\u003c/a\u003e\n\t\u003ca href='https://opensource.org/licenses/Apache-2.0'\u003e\n\t    \u003cimg src='https://img.shields.io/github/license/clementchadebec/pyraug?color=blue' /\u003e\n\t\u003c/a\u003e\n\t\u003ca href='https://pyraug.readthedocs.io/en/latest/?badge=latest'\u003e\n\t    \u003cimg src='https://readthedocs.org/projects/pyraug/badge/?version=latest' alt='Documentation \tStatus' /\u003e\n\t\u003c/a\u003e\n\t\u003ca href='https://pepy.tech/project/pyraug'\u003e\n\t    \u003cimg src='https://static.pepy.tech/personalized-badge/pyraug?period=total\u0026units=international_system\u0026left_color=grey\u0026right_color=orange\u0026left_text=downloads' alt='Downloads \tStatus' /\u003e\n\t\u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pyraug.readthedocs.io/en/latest/\"\u003eDocumentation\u003c/a\u003e\n\u003c/p\u003e\n\t\n\n\n# Pyraug \n\nThis library provides a way to perform Data Augmentation using Variational Autoencoders in a \nreliable way even in challenging contexts such as high dimensional and low sample size \ndata.\n\n\n# Installation\n\nTo install the library from [pypi.org](https://pypi.org/) run the following using ``pip``\n\n```bash\n$ pip install pyraug\n``` \n\n\nor alternatively you can clone the github repo to access to tests, tutorials and scripts.\n```bash\n$ git clone https://github.com/clementchadebec/pyraug.git\n```\nand install the library\n```bash\n$ cd pyraug\n$ pip install .\n``` \n\n# Augmenting your Data\n\n\nIn Pyraug, a typical augmentation process is divided into 2 distinct parts:\n\n1. Train a model using the Pyraug's ```TrainingPipeline``` or using the provided ``scripts/training.py`` script\n2. Generate new data from a trained model using Pyraug's ```GenerationPipeline``` or using the provided ``scripts/generation.py`` script\n\nThere exist two ways to augment your data pretty straightforwardly using Pyraug's built-in functions. \n\n\n## Using Pyraug's Pipelines\n\nPyraug provides two pipelines that may be used to either train a model on your own data or generate new data with a pretrained model.\n\n\n**note**: These pipelines are independent of the choice of the model and sampler. Hence, they can be used even if you want to access to more advanced features such as defining your own autoencoding architecture. \n\n### Launching a model training\n\n\nTo launch a model training, you only need to call a `TrainingPipeline` instance. \nIn its most basic version the `TrainingPipeline` can be built without any arguments.\nThis will by default train a `RHVAE` model with default autoencoding architecture and parameters.\n\n```python\n\u003e\u003e\u003e from pyraug.pipelines import TrainingPipeline\n\u003e\u003e\u003e pipeline = TrainingPipeline()\n\u003e\u003e\u003e pipeline(train_data=dataset_to_augment)\n```\n\nwhere ``dataset_to_augment`` is either a `numpy.ndarray`, `torch.Tensor` or a path to a folder where each file is a data (handled data formats are ``.pt``, ``.nii``, ``.nii.gz``, ``.bmp``, ``.jpg``, ``.jpeg``, ``.png``). \n\nMore generally, you can instantiate your own model and train it with the `TrainingPipeline`. For instance, if you want to instantiate a basic `RHVAE` run:\n\n\n```python\n\u003e\u003e\u003e from pyraug.models import RHVAE\n\u003e\u003e\u003e from pyraug.models.rhvae import RHVAEConfig\n\u003e\u003e\u003e model_config = RHVAEConfig(\n...    input_dim=int(intput_dim)\n... ) # input_dim is the shape of a flatten input data\n...   # needed if you did not provide your own architectures\n\u003e\u003e\u003e model = RHVAE(model_config)\n```\n\n\nIn case you instantiate yourself a model as shown above and you did not provide all the network architectures (encoder, decoder \u0026 metric if applicable), the `ModelConfig` instance will expect you to provide the input dimension of your data which equals to ``n_channels x height x width x ...``. Pyraug's VAE models' networks indeed default to Multi Layer Perceptron neural networks which automatically adapt to the input data shape. \n\n**note**: In case you have different size of data, Pyraug will reshape it to the minimum size ``min_n_channels x min_height x min_width x ...``\n\n\n\nThen the `TrainingPipeline` can be launched by running:\n\n```python\n\u003e\u003e\u003e from pyraug.pipelines import TrainingPipeline\n\u003e\u003e\u003e pipe = TrainingPipeline(model=model)\n\u003e\u003e\u003e pipe(train_data=dataset_to_augment)\n```\n\nAt the end of training, the model weights ``models.pt`` and model config ``model_config.json`` file \nwill be saved in a folder ``outputs/my_model/training_YYYY-MM-DD_hh-mm-ss/final_model``. \n\n**Important**: For high dimensional data we advice you to provide you own network architectures and potentially adapt the training and model parameters see [documentation](https://pyraug.readthedocs.io/en/latest/advanced_use.html) for more details.\n\n\n### Launching data generation\n\n\nTo launch the data generation process from a trained model, run the following.\n\n```python\n\u003e\u003e\u003e from pyraug.pipelines import GenerationPipeline\n\u003e\u003e\u003e from pyraug.models import RHVAE\n\u003e\u003e\u003e model = RHVAE.load_from_folder('path/to/your/trained/model') # reload the model\n\u003e\u003e\u003e pipe = GenerationPipeline(model=model) # define pipeline\n\u003e\u003e\u003e pipe(samples_number=10) # This will generate 10 data points\n```\n\nThe generated data is in ``.pt`` files in ``dummy_output_dir/generation_YYYY-MM-DD_hh-mm-ss``. By default, it stores batch data of a maximum of 500 samples.\n\n\n\n### Retrieve generated data\n\nGenerated data can then be loaded pretty easily by running\n\n```python\n\u003e\u003e\u003e import torch\n\u003e\u003e\u003e data = torch.load('path/to/generated_data.pt')\n\n```\n\n## Using the provided scripts\n\n\nPyraug provides two scripts allowing you to augment your data directly with commandlines.\n\n\n**note**: To access to the predefined scripts you should first clone the Pyraug's repository.\nThe following scripts are located in [scripts folder](https://github.com/clementchadebec/pyraug/tree/main/scripts). For the time being, only `RHVAE` model training and generation is handled by the provided scripts. Models will be added as they are implemented in [pyraug.models](https://github.com/clementchadebec/pyraug/tree/main/src/pyraug/models) \n\n\n### Launching a model training:\n\nTo launch a model training, run \n\n```\n$ python scripts/training.py --path_to_train_data \"path/to/your/data/folder\" \n```\n\n\nThe data must be located in ``path/to/your/data/folder`` where each input data is a file. Handled image types are ``.pt``, ``.nii``, ``.nii.gz``, ``.bmp``, ``.jpg``, ``.jpeg``, ``.png``. Depending on the usage, other types will be progressively added.\n\n\nAt the end of training, the model weights ``models.pt`` and model config ``model_config.json`` file \nwill be saved in a folder ``outputs/my_model_from_script/training_YYYY-MM-DD_hh-mm-ss/final_model``. \n\n\n### Launching data generation\n\n\nThen, to launch the data generation process from a trained model, you only need to run \n\n```\n$ python scripts/generation.py --num_samples 10 --path_to_model_folder 'path/to/your/trained/model/folder' \n```\n\n\nThe generated data is stored in several ``.pt`` files in ``outputs/my_generated_data_from_script/generation_YYYY-MM-DD_hh_mm_ss``. By default, it stores batch data of 500 samples.\n\n\n\n**Important**:  In the simplest configuration, default configurations are used in the scripts. You can easily override as explained in [documentation](https://pyraug.readthedocs.io/en/latest/advanced/setting_configs.html). See tutorials for a more in depth example.\n\n\n\n### Retrieve generated data\n\nGenerated data can then be loaded pretty easily by running\n\n```python\n\u003e\u003e\u003e import torch\n\u003e\u003e\u003e data = torch.load('path/to/generated_data.pt')\n```\n\n\n\n## Getting your hands on the code\n\nTo help you to understand the way Pyraug works and how you can augment your data with this library we also\nprovide tutorials that can be found in [examples folder](https://github.com/clementchadebec/pyraug/tree/main/examples):\n\n- [getting_started.ipynb](https://github.com/clementchadebec/pyraug/tree/main/examples) explains you how to train a model and generate new data using Pyraug's Pipelines [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/clementchadebec/pyraug/blob/main/examples/getting_started.ipynb)\n- [playing_with_configs.ipynb](https://github.com/clementchadebec/pyraug/tree/main/examples) shows you how to amend the predefined configuration to adapt them to your data [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/clementchadebec/pyraug/blob/main/examples/playing_with_configs.ipynb)\n- [making_your_own_autoencoder.ipynb](https://github.com/clementchadebec/pyraug/tree/main/examples) shows you how to pass your own networks to the models implemented in Pyraug [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/clementchadebec/pyraug/blob/main/examples/making_your_own_autoencoder.ipynb)\n\n## Dealing with issues\n\nIf you are experiencing any issues while running the code or request new features please [open an issue on github](https://github.com/clementchadebec/pyraug/issues)\n\n\n## Citing\n\nIf you use this library please consider citing us:\n\n```bibtex\n@article{chadebec2022data,\n  title={Data augmentation in high dimensional low sample size setting using a geometry-based variational autoencoder},\n  author={Chadebec, Cl{\\'e}ment and Thibeau-Sutre, Elina and Burgos, Ninon and Allassonni{\\`e}re, St{\\'e}phanie},\n  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},\n  year={2022},\n  publisher={IEEE}\n}\n```\n\n### Credits\nLogo: [SaulLu](https://github.com/saullu)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclementchadebec%2Fpyraug","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclementchadebec%2Fpyraug","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclementchadebec%2Fpyraug/lists"}