{"id":22506590,"url":"https://github.com/a-poor/datatransform.jl","last_synced_at":"2026-05-05T05:39:15.345Z","repository":{"id":104407367,"uuid":"364983120","full_name":"a-poor/DataTransform.jl","owner":"a-poor","description":"A package for defining (and performing) tabular-data transformations with JSON.","archived":false,"fork":false,"pushed_at":"2021-05-07T00:15:11.000Z","size":9,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-26T06:35:57.648Z","etag":null,"topics":["data","data-science","data-transformation","etl","feature-engineering","json","julia","julia-package","tabular-data"],"latest_commit_sha":null,"homepage":"","language":"Julia","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/a-poor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-06T17:10:11.000Z","updated_at":"2021-05-07T00:15:13.000Z","dependencies_parsed_at":null,"dependency_job_id":"6e73b4d4-df9d-42df-acc9-45f3814988d9","html_url":"https://github.com/a-poor/DataTransform.jl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a-poor%2FDataTransform.jl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a-poor%2FDataTransform.jl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a-poor%2FDataTransform.jl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a-poor%2FDataTransform.jl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/a-poor","download_url":"https://codeload.github.com/a-poor/DataTransform.jl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245944062,"owners_count":20697948,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-science","data-transformation","etl","feature-engineering","json","julia","julia-package","tabular-data"],"created_at":"2024-12-07T00:44:33.551Z","updated_at":"2026-05-05T05:39:15.316Z","avatar_url":"https://github.com/a-poor.png","language":"Julia","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DataTransform.jl\r\n\r\n_created by Austin Poor_\r\n\r\n__Under Construction__\r\n\r\nThis library is intended to help develop a standardized data-transformation schema for tabular data.\r\n\r\nData transformation files are defined in JSON so they are human-readable, trackable in VC, and so they can be passed between languages. `DataTransform`'s goal is to promote reproducablity and clean transformations (rather than hack-y fixes, whenever possible).\r\n\r\nYou pass in a rectangular dataset and define a stack of transformations to apply -- `DataTransform.jl` does the rest!\r\n\r\n## Quick Start\r\n\r\n...\r\n\r\n\r\n## Installation\r\n\r\n...\r\n\r\n\r\n## DataTransform Schema Reference\r\n\r\nData transform files are defined in JSON.\r\n\r\n* Select Column\r\n  * by Name\r\n    * Exact Match\r\n    * Starts With\r\n    * Ends With\r\n    * Contains\r\n    * RegEx\r\n  * by Index\r\n  * by Data Type\r\n* Filter Rows\r\n  * by Value\r\n    * Exact\r\n    * In List\r\n    * Range\r\n    * RegEx (for Text)\r\n    * Apply Function to Column\r\n* Arange Columns\r\n  * Sort Names\r\n  * In Specified Orders\r\n* Arange Rows\r\n  * Sort by Column (or Column Combinations)\r\n* Apply Function by Column (or Row?)\r\n  * Mapping\r\n  * Convert Type\r\n  * Case Function (`if/else-if/else` or `case-when`)\r\n  * Predefined function from `FeatureEng`\r\n* Duplicate Column\r\n  * (Using Column Select Operations)\r\n* Duplicate Rows\r\n  * (Using Row Select Operations)\r\n* Rename Column\r\n  * Mapping\r\n  * RegEx\r\n* ~~Join with Other Dataset?~~\r\n\r\nThe basic schema structure is:\r\n\r\n```json\r\n{\r\n  \"version\": \"0.1\", // Schema Version Number for Compatability\r\n  \"transformations\": [...] // List of Transformation Objects \r\n}\r\n```\r\n\r\nDeciding how to handle errors. Set error handling globally.\r\n\r\nExample file: [sample-transform.json](./sample-transform.json).\r\n\r\n\r\n## API Reference\r\n\r\n...\r\n\r\n\r\n## Contributing\r\n\r\nAny contribution would be greatly appreciated! You can help by suggestint additions to the package or changes you think would help. In addition, please feel free to submit an issue or PR!\r\n\r\nThanks!\r\nAustin\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fa-poor%2Fdatatransform.jl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fa-poor%2Fdatatransform.jl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fa-poor%2Fdatatransform.jl/lists"}