{"id":24917976,"url":"https://github.com/azukds/tubular","last_synced_at":"2025-04-13T08:34:03.974Z","repository":{"id":39678947,"uuid":"360885026","full_name":"azukds/tubular","owner":"azukds","description":"Python package implementing transformers for pre processing steps for machine learning. ","archived":false,"fork":false,"pushed_at":"2025-04-08T14:26:02.000Z","size":2467,"stargazers_count":56,"open_issues_count":59,"forks_count":18,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-08T15:28:14.178Z","etag":null,"topics":["feature-engineering","pre-processing","transformers"],"latest_commit_sha":null,"homepage":"https://tubular.readthedocs.io/en/latest/index.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/azukds.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.rst","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-23T12:58:45.000Z","updated_at":"2025-04-08T10:03:26.000Z","dependencies_parsed_at":"2023-11-07T16:44:09.182Z","dependency_job_id":"c0e36b57-4e7a-4c85-be91-2357d33450b2","html_url":"https://github.com/azukds/tubular","commit_stats":{"total_commits":676,"total_committers":23,"mean_commits":"29.391304347826086","dds":0.5325443786982249,"last_synced_commit":"18ca370dcf4873acce8699d323a5b0ba986b155f"},"previous_names":["azukds/tubular","lvgig/tubular"],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/azukds%2Ftubular","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/azukds%2Ftubular/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/azukds%2Ftubular/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/azukds%2Ftubular/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/azukds","download_url":"https://codeload.github.com/azukds/tubular/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248684564,"owners_count":21145103,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["feature-engineering","pre-processing","transformers"],"created_at":"2025-02-02T09:08:21.286Z","updated_at":"2025-04-13T08:34:03.952Z","avatar_url":"https://github.com/azukds.png","language":"Python","funding_links":[],"categories":["Feature Engineering","Libraries/Packages/Scripts"],"sub_categories":["General","Polars plugins"],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/lvgig/tubular/raw/main/logo.png\"\u003e\n\u003c/p\u003e\n\nTubular pre-processing for machine learning!\n\n----\n\n![PyPI](https://img.shields.io/pypi/v/tubular?color=success\u0026style=flat)\n![Read the Docs](https://img.shields.io/readthedocs/tubular)\n![GitHub](https://img.shields.io/github/license/lvgig/tubular)\n![GitHub last commit](https://img.shields.io/github/last-commit/lvgig/tubular)\n![GitHub issues](https://img.shields.io/github/issues/lvgig/tubular)\n![Build](https://github.com/lvgig/tubular/actions/workflows/python-package.yml/badge.svg?branch=main)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/lvgig/tubular/HEAD?labpath=examples)\n\n`tubular` implements pre-processing steps for tabular data commonly used in machine learning pipelines.\n\nThe transformers are compatible with [scikit-learn](https://scikit-learn.org/) [Pipelines](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html). Each has a `transform` method to apply the pre-processing step to data and a `fit` method to learn the relevant information from the data, if applicable.\n\nThe transformers in `tubular` work with data in [pandas](https://pandas.pydata.org/) [DataFrames](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).\n\nThere are a variety of transformers to assist with;\n\n- capping\n- dates\n- imputation\n- mapping\n- categorical encoding\n- numeric operations\n\nHere is a simple example of applying capping to two columns;\n\n```python\nfrom tubular.capping import CappingTransformer\nimport pandas as pd\nfrom sklearn.datasets import fetch_california_housing\n\n# load the california housing dataset\ncali = fetch_california_housing()\nX = pd.DataFrame(cali['data'], columns=cali['feature_names'])\n\n# initialise a capping transformer for 2 columns\ncapper = CappingTransformer(capping_values = {'AveOccup': [0, 10], 'HouseAge': [0, 50]})\n\n# transform the data\nX_capped = capper.transform(X)\n```\n\n## Installation\n\nThe easiest way to get `tubular` is directly from [pypi](https://pypi.org/project/tubular/) with;\n\n `pip install tubular`\n\n## Documentation\n\nThe documentation for `tubular` can be found on [readthedocs](https://tubular.readthedocs.io/en/latest/).\n\nInstructions for building the docs locally can be found in [docs/README](https://github.com/lvgig/tubular/blob/main/docs/README.md).\n\n## Examples\n\nTo help get started there are example notebooks in the [examples](https://github.com/lvgig/tubular/tree/main/examples) folder in the repo that show how to use each transformer.\n\nTo open the example notebooks in [binder](https://mybinder.org/) click [here](https://mybinder.org/v2/gh/lvgig/tubular/HEAD?labpath=examples) or click on the `launch binder` shield above and then click on the directory button in the side bar to the left to navigate to the specific notebook.\n\n## Issues\n\nFor bugs and feature requests please open an [issue](https://github.com/lvgig/tubular/issues).\n\n## Build and test\n\nThe test framework we are using for this project is [pytest](https://docs.pytest.org/en/stable/). To build the package locally and run the tests follow the steps below.\n\nFirst clone the repo and move to the root directory;\n\n```shell\ngit clone https://github.com/lvgig/tubular.git\ncd tubular\n```\n\nNext install `tubular` and development dependencies;\n\n```shell\npip install . -r requirements-dev.txt\n```\n\nFinally run the test suite with `pytest`;\n\n```shell\npytest\n```\n\n## Contribute\n\n`tubular` is under active development, we're super excited if you're interested in contributing! \n\nSee the [CONTRIBUTING](https://github.com/lvgig/tubular/blob/main/CONTRIBUTING.rst) file for the full details of our working practices.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fazukds%2Ftubular","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fazukds%2Ftubular","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fazukds%2Ftubular/lists"}