{"id":18249252,"url":"https://github.com/gretelai/trainer","last_synced_at":"2025-04-04T15:32:48.159Z","repository":{"id":37797425,"uuid":"454970329","full_name":"gretelai/trainer","owner":"gretelai","description":"Simple interface to synthesize complex and highly dimensional datasets using Gretel APIs.","archived":false,"fork":false,"pushed_at":"2025-03-05T05:49:46.000Z","size":1886,"stargazers_count":29,"open_issues_count":0,"forks_count":7,"subscribers_count":20,"default_branch":"main","last_synced_at":"2025-03-20T15:02:45.483Z","etag":null,"topics":["data-generation","deep-learning","gan","gans","language-model","machine-learning","synthetic-data"],"latest_commit_sha":null,"homepage":"https://gretel.ai/synthetics","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gretelai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-02T23:33:04.000Z","updated_at":"2025-03-05T05:49:50.000Z","dependencies_parsed_at":"2024-01-12T00:51:59.378Z","dependency_job_id":"e0962e4b-09db-4575-8680-9c09a1fde647","html_url":"https://github.com/gretelai/trainer","commit_stats":null,"previous_names":[],"tags_count":29,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gretelai%2Ftrainer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gretelai%2Ftrainer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gretelai%2Ftrainer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gretelai%2Ftrainer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gretelai","download_url":"https://codeload.github.com/gretelai/trainer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247203159,"owners_count":20900927,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-generation","deep-learning","gan","gans","language-model","machine-learning","synthetic-data"],"created_at":"2024-11-05T09:39:34.414Z","updated_at":"2025-04-04T15:32:43.152Z","avatar_url":"https://github.com/gretelai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Gretel Trainer\n\nThis module is designed to provide a simple interface to help users successfully train synthetic models on complex datasets with high row and column counts, and offers features such as Cloud SaaS based training and multi-GPU based parallelization. Get started for free with an API key from [Gretel.ai](https://console.gretel.cloud).\n\n## Current functionality and features:\n\n* Synthetic data generators for text, tabular, and time-series data with the following\n  features:\n    * Balance datasets or boost a minority class using Conditional Data Generation.\n    * Automated data validation.\n    * Synthetic data quality reports.\n    * Privacy filters and optional differential privacy support.\n* Multiple [model types supported](https://docs.gretel.ai/synthetics/models):\n    * `Gretel-LSTM` model type supports text, tabular, time-series, and conditional data generation.\n    * `Gretel-ACTGAN` model type supports tabular and conditional data generation.\n    * `Gretel-GPT` natural language synthesis based on an open-source implementation of GPT-3 (coming soon).\n    * `Gretel-DGAN` multi-variate time series based on DoppelGANger (coming soon).\n\n## Try it out now!\n\nIf you want to quickly get started synthesizing data with **Gretel.ai**, simply click the button below and follow the examples. See additional Python3 and Jupyter Notebook examples in the `./notebooks` folder.\n\n[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gretelai/trainer/blob/main/notebooks/trainer-examples.ipynb)\n\n## Join the Synthetic Data Community Discord\n\nIf you want to be part of the Synthetic Data Community to receive announcements of the latest releases,\nask questions, suggest new features or participate in the development meetings, please join\nthe Synthetic Data Community Server!\n\n[![Discord](https://img.shields.io/discord/1007817822614847500?label=Discord\u0026logo=Discord)](https://gretel.ai/discord)\n\n# Install\n\n**Using `pip`:**\n\n```bash\npip install -U gretel-trainer\n```\n\n# Quickstart\n\n## 1. Add your [Gretel API](https://console.gretel.cloud) key via the Gretel CLI.\nUse the Gretel client to store your API key to disk. This step is optional, the trainer will prompt for an API key in the next step.\n```bash\ngretel configure\n```\n\n## 2. Train or fine-tune a model using the Gretel API\n\n```python3\nfrom gretel_trainer import trainer\n\ndataset = \"https://gretel-public-website.s3-us-west-2.amazonaws.com/datasets/USAdultIncome5k.csv\"\n\nmodel = trainer.Trainer()\nmodel.train(dataset)\n```\n\n## 3. Generate synthetic data!\n```python3\ndf = model.generate()\n```\n\n## Development\n\nSetup environment and install dependencies.\n\n```sh\npython -m venv venv\nsource venv/bin/activate\npip install -r requirements-dev.txt\npip install -e .\n```\n\n- Run tests via `make test`\n- Run type-checking (limited coverage) via `make type`\n\n## TODOs / Roadmap\n\n- [ ] Enable conditional generation via SDK interface (supported in Notebooks currently).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgretelai%2Ftrainer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgretelai%2Ftrainer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgretelai%2Ftrainer/lists"}