{"id":19015228,"url":"https://github.com/sdv-dev/rdt","last_synced_at":"2026-04-10T18:12:34.429Z","repository":{"id":38187800,"uuid":"119423706","full_name":"sdv-dev/RDT","owner":"sdv-dev","description":"A library of Reversible Data Transforms","archived":false,"fork":false,"pushed_at":"2025-05-11T15:27:18.000Z","size":2344,"stargazers_count":124,"open_issues_count":41,"forks_count":27,"subscribers_count":14,"default_branch":"main","last_synced_at":"2025-05-11T16:33:52.696Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sdv-dev.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-01-29T18:33:37.000Z","updated_at":"2025-05-08T16:12:56.000Z","dependencies_parsed_at":"2024-05-13T14:29:59.553Z","dependency_job_id":"77bbd101-97fb-413c-b13f-48ab7143ae5e","html_url":"https://github.com/sdv-dev/RDT","commit_stats":{"total_commits":663,"total_committers":46,"mean_commits":14.41304347826087,"dds":0.8009049773755657,"last_synced_commit":"545be6f17224ce9273e07aad4cef81144ae8042f"},"previous_names":[],"tags_count":60,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdv-dev%2FRDT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdv-dev%2FRDT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdv-dev%2FRDT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdv-dev%2FRDT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sdv-dev","download_url":"https://codeload.github.com/sdv-dev/RDT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254337612,"owners_count":22054253,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T19:36:16.878Z","updated_at":"2026-04-10T18:12:34.423Z","avatar_url":"https://github.com/sdv-dev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003cbr/\u003e\n\u003cp align=\"center\"\u003e\n    \u003ci\u003eThis repository is part of \u003ca href=\"https://sdv.dev\"\u003eThe Synthetic Data Vault Project\u003c/a\u003e, a project from \u003ca href=\"https://datacebo.com\"\u003eDataCebo\u003c/a\u003e.\u003c/i\u003e\n\u003c/p\u003e\n\n[![Development Status](https://img.shields.io/badge/Development%20Status-5%20--%20Production/Stable-green)](https://pypi.org/search/?q=\u0026o=\u0026c=Development+Status+%3A%3A+5+-+Production%2FStable)\n[![PyPi Shield](https://img.shields.io/pypi/v/RDT.svg)](https://pypi.python.org/pypi/RDT)\n[![Unit Tests](https://github.com/sdv-dev/RDT/actions/workflows/unit.yml/badge.svg)](https://github.com/sdv-dev/RDT/actions/workflows/unit.yml)\n[![Downloads](https://pepy.tech/badge/rdt)](https://pepy.tech/project/rdt)\n[![Coverage Status](https://codecov.io/gh/sdv-dev/RDT/branch/main/graph/badge.svg)](https://codecov.io/gh/sdv-dev/RDT)\n[![Forum](https://img.shields.io/badge/Forum-Join%20now!-36C5F0)](https://forum.datacebo.com)\n\n\u003cdiv align=\"left\"\u003e\n\u003cbr/\u003e\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://github.com/sdv-dev/RDT\"\u003e\n\u003cimg align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/RDT-DataCebo.png\"\u003e\u003c/img\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n# Overview\n\nRDT (Reversible Data Transforms) is a Python library that transforms raw data into fully numerical\ndata, ready for data science. The transforms are reversible, allowing you to convert from numerical\ndata back into your original format.\n\n\u003cimg align=\"center\" src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/rdt_main_tranformation.png\"\u003e\u003c/img\u003e\n\n\n# Install\n\nInstall **RDT** using ``pip``  or ``conda``. We recommend using a virtual environment to avoid\nconflicts with other software on your device.\n\n```bash\npip install rdt\n```\n\n```bash\nconda install -c conda-forge rdt\n```\n\nFor more information about using reversible data transformations, visit the [RDT Documentation](https://docs.sdv.dev/rdt).\n\n\n# Quickstart\n\nIn this short series of tutorials we will guide you through a series of steps that will\nhelp you getting started using **RDT** to transform columns, tables and datasets.\n\n## Load the demo data\n\nAfter you have installed RDT, you can get started using the demo dataset.\n\n```python3\nfrom rdt import get_demo\n\ncustomers = get_demo()\n```\n\nThis dataset contains some randomly generated values that describe the customers of an online\nmarketplace.\n\n```\n  last_login email_optin credit_card  age  dollars_spent\n0 2021-06-26       False        VISA   29          99.99\n1 2021-02-10       False        VISA   18            NaN\n2        NaT       False        AMEX   21           2.50\n3 2020-09-26        True         NaN   45          25.00\n4 2020-12-22         NaN    DISCOVER   32          19.99\n```\n\nLet's transform this data so that each column is converted to full, numerical data ready for data\nscience.\n\n## Creating the HyperTransformer \u0026 config\n\nThe ``HyperTransformer`` is capable of transforming multi-column datasets.\n\n```python3\nfrom rdt import HyperTransformer\n\nht = HyperTransformer()\n```\n\nThe `HyperTransformer` needs to know about the columns in your dataset and which transformers to\napply to each. These are described by a config. We can ask the `HyperTransformer` to automatically\ndetect it based on the data we plan to use.\n\n```python3\nht.detect_initial_config(data=customers)\n```\n\nThis will create and set the config.\n\n```\nConfig:\n{\n    \"sdtypes\": {\n        \"last_login\": \"datetime\",\n        \"email_optin\": \"boolean\",\n        \"credit_card\": \"categorical\",\n        \"age\": \"numerical\",\n        \"dollars_spent\": \"numerical\"\n    },\n    \"transformers\": {\n        \"last_login\": \"UnixTimestampEncoder()\",\n        \"email_optin\": \"BinaryEncoder()\",\n        \"credit_card\": \"FrequencyEncoder()\",\n        \"age\": \"FloatFormatter()\",\n        \"dollars_spent\": \"FloatFormatter()\"\n    }\n}\n```\n\nThe `sdtypes` dictionary describes the semantic data types of each of your columns and the\n`transformers` dictionary describes which transformer to use for each column. You can customize the\ntransformers and their settings. (See the [Transformers Glossary](https://docs.sdv.dev/rdt/transformers-glossary/browse-transformers) for more information).\n\n## Fitting \u0026 using the HyperTransformer\n\nThe `HyperTransformer` references the config while learning the data during the `fit` stage.\n\n```python3\nht.fit(customers)\n```\n\nOnce the transformer is fit, it's ready to use. Use the transform method to transform all columns\nof your dataset at once.\n\n```python3\ntransformed_data = ht.transform(customers)\n```\n\n```\n   last_login.value  email_optin.value  credit_card.value  age.value  dollars_spent.value\n0      1.624666e+18                0.0                0.2         29                99.99\n1      1.612915e+18                0.0                0.2         18                36.87\n2      1.611814e+18                0.0                0.5         21                 2.50\n3      1.601078e+18                1.0                0.7         45                25.00\n4      1.608595e+18                0.0                0.9         32                19.99\n```\n\nThe ``HyperTransformer`` applied the assigned transformer to each individual column. Each column\nnow contains fully numerical data that you can use for your project!\n\nWhen you're done with your project, you can also transform the data back to the original format\nusing the `reverse_transform` method.\n\n```python3\noriginal_format_data = ht.reverse_transform(transformed_data)\n```\n\n```\n  last_login email_optin credit_card  age  dollars_spent\n0        NaT       False        VISA   29          99.99\n1 2021-02-10       False        VISA   18            NaN\n2        NaT       False        AMEX   21            NaN\n3 2020-09-26        True         NaN   45          25.00\n4 2020-12-22       False    DISCOVER   32          19.99\n```\n\n# What's Next?\n\nTo learn more about reversible data transformations, visit the [RDT Documentation](https://docs.sdv.dev/rdt).\n\n\n---\n\n\n\u003cdiv align=\"center\"\u003e\n\u003ca href=\"https://datacebo.com\"\u003e\u003cimg align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/DataCebo.png\"\u003e\u003c/img\u003e\u003c/a\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\u003cbr/\u003e\n\n[The Synthetic Data Vault Project](https://sdv.dev) was first created at MIT's [Data to AI Lab](\nhttps://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we\ncreated [DataCebo](https://datacebo.com) in 2020 with the goal of growing the project.\nToday, DataCebo is the proud developer of SDV, the largest ecosystem for\nsynthetic data generation \u0026 evaluation. It is home to multiple libraries that support synthetic\ndata, including:\n\n* 🔄 Data discovery \u0026 transformation. Reverse the transforms to reproduce realistic data.\n* 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,\n  multi table and time series data.\n* 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data\n  generation models.\n\n[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully\nintegrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries\nfor specific needs.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsdv-dev%2Frdt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsdv-dev%2Frdt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsdv-dev%2Frdt/lists"}