{"id":14970698,"url":"https://github.com/alexandregazagnes/scikit-transformers","last_synced_at":"2025-10-26T13:31:12.146Z","repository":{"id":219243965,"uuid":"748552140","full_name":"AlexandreGazagnes/scikit-transformers","owner":"AlexandreGazagnes","description":"Very usefull package to enable and provide custom transformers such as LogColumnTransformer, BoolColumnTransformers and others fancy transformers.","archived":false,"fork":false,"pushed_at":"2024-07-19T14:03:29.000Z","size":10559,"stargazers_count":4,"open_issues_count":12,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-10-30T04:54:29.635Z","etag":null,"topics":["data","data-science","log","python","scikit-learn","transformer"],"latest_commit_sha":null,"homepage":"https://alexandregazagnes.github.io/scikit-transformers","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlexandreGazagnes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-26T08:29:56.000Z","updated_at":"2024-02-12T22:30:29.000Z","dependencies_parsed_at":"2024-11-15T10:41:54.520Z","dependency_job_id":"dc99dc6f-d857-419a-bb0d-d1da92912098","html_url":"https://github.com/AlexandreGazagnes/scikit-transformers","commit_stats":{"total_commits":142,"total_committers":1,"mean_commits":142.0,"dds":0.0,"last_synced_commit":"59677a96b737891bb3231da6fe79682e60b2e0db"},"previous_names":["alexandregazagnes/scikit-transformer","alexandregazagnes/scikit-transformers"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexandreGazagnes%2Fscikit-transformers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexandreGazagnes%2Fscikit-transformers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexandreGazagnes%2Fscikit-transformers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexandreGazagnes%2Fscikit-transformers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlexandreGazagnes","download_url":"https://codeload.github.com/AlexandreGazagnes/scikit-transformers/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238337297,"owners_count":19455285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-science","log","python","scikit-learn","transformer"],"created_at":"2024-09-24T13:44:00.403Z","updated_at":"2025-10-26T13:31:11.696Z","avatar_url":"https://github.com/AlexandreGazagnes.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![image](https://github.com/AlexandreGazagnes/scikit-transformers/blob/main/docs/assets/img/img.png?raw=true)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n![Python](https://img.shields.io/badge/python-3.10.x-green.svg)\n![Repo Size](https://img.shields.io/github/repo-size/AlexandreGazagnes/scikit-transformers)\n[![PEP8](https://img.shields.io/badge/code%20style-pep8-orange.svg)](https://www.python.org/dev/peps/pep-0008/)\n[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)\n![Coverage](https://github.com/AlexandreGazagnes/scikit-transformers/blob/main/docs/assets/img/cov.svg?raw=true)\n![Tests](https://github.com/AlexandreGazagnes/scikit-transformers/actions/workflows/tests.yaml/badge.svg)\n![Statics](https://github.com/AlexandreGazagnes/scikit-transformers/actions/workflows/statics.yaml/badge.svg)\n![Doc](https://github.com/AlexandreGazagnes/scikit-transformers/actions/workflows/docs.yaml/badge.svg)\n![Pypi](https://github.com/AlexandreGazagnes/scikit-transformers/actions/workflows/publish.yaml/badge.svg)\n![GitHub commit activity](https://img.shields.io/github/commit-activity/m/AlexandreGazagnes/scikit-transformers)\n\n# Scikit-transformers : Scikit-learn + Custom transformers\n\n\n## About\n\n**scikit-transformers** is a very usefull package to enable and provide custom transformers such as ```LogColumnTransformer```, ```BoolColumnTransformers``` and others fancy transformers.\n\nIt was created to provide a simple way to use custom transformers in ```scikit-learn``` pipelines, and allow to use them in a ```scikit-learn ```model, using ```GridSearchCV``` for testing and tuning hyperparameters.\n\nThe starting point was to provide a simple ```LogColumnTransformer```, which is a simple wrapper around the numpy log function, making possible to use a skew threshold to apply the log transformation only on columns with a skew superior to a given threshold.\n\nWith ```scikit-transformers```, it is now possible to use this ```LogColumnTransformer``` in transformer in a ```GridSearchCV``` using a skew threshold as hyperparameter to find what columns are good to log or not.\n\n```LogColumnTransformer``` is one of the many transformers implemented in ```scikit-transformers```.\n\n\n\n## Installation\n\nUsing regular pip and venv tools :\n\n```bash\npython3 -m venv .venv\nsource .venv/bin/activate\npip install scikit-transformers\n```\n\n\n## Usage\n\nFor a very basic usage :\n```python\nimport pandas as pd\n\nfrom sktransf.trasnformer import LogColumnTransformer\n\ndf = pd.DataFrame(\n    { \"a\": range(10),\n      \"b\": range(10)\n    }\n)\n\nlogger = LogColumnTransformer()\nlogger.fit_transform(df)\ndf_transf = logger.transform(df)\n```\n\nUsing common transformers : \n\n```python\nimport pandas as pd\n\nfrom sktransf.transformer import LogColumnTransformer, BoolColumnTransformer\nfrom sktransf.selector import DropUniqueColumnSelector\n\ndf = pd.DataFrame(\n    { \"a\": range(10),\n      \"b\": range(10)\n    }\n)\n\ndf_bool = BoolColumnTransformer().fit_transform(df)\ndf_unique = DropUniqueColumnTransformer().fit_transform(df)\ndf_logged = LogColumnTransformer().fit_transform(df)\n```\n\nUsing a pipeline with a scikit-learn model : \n\n```python\nimport pandas as pd\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.linear_model import LinearRegression\n\nfrom sktransf.transformer import LogColumnTransformer, BoolColumnTransformer\nfrom sktransf.selector import DropUniqueColumnSelector\n\npipe = Pipeline([\n    ('bool', BoolColumnTransformer()),\n    ('unique', DropUniqueColumnTransformer()),\n    ('log', LogColumnTransformer()),\n    ('model', LinearRegression())\n])\n\nX = pd.DataFrame(\n    { \"a\": range(10),\n      \"b\": range(10)\n    }\n)\n\ny = range(10)\n\npipe.fit(X, y)\n\ny_pred = pipe.predict(X)\n```\n\n\n## Documentation\n\nFor more specific information, please refer to the notebooks: \n\n* Transformers : \n  * [LogColumnTransformer notebook](https://github.com/AlexandreGazagnes/scikit-transformers/blob/main/docs/notebooks/transformer/LogColumnTransformer.ipynb)\n  * [BoolColumnTransformer notebook](https://github.com/AlexandreGazagnes/scikit-transformers/blob/main/docs/notebooks/transformer/BoolColumnTransformer.ipynb)\n* Selectors : \n  * [DropUniqueColumnSelector notebook](https://github.com/AlexandreGazagnes/scikit-transformers/blob/main/docs/notebooks/selector/DropUniqueColumnSelector.ipynb)\n  * [DropSkuColumnSelector notebook](https://github.com/AlexandreGazagnes/scikit-transformers/blob/main/docs/notebooks/selector/DropSkuColumnSelector.ipynb)\n* Pipelines :\n  * [Pipelines notebook](https://github.com/AlexandreGazagnes/scikit-transformers/blob/main/docs/notebooks/Pipelines.ipynb)\n\n\nA complete documentation is be available on the  [github page](https://alexandregazagnes.github.io/scikit-transformers/).\n\n\n## Changelog, Releases and Roadmap\n\nPlease refer to the [changelog](https://alexandregazagnes.github.io/scikit-transformers/CHANGELOG/) page for more information.\n\n\n## Contributing\n\nPull requests are welcome.\n\nFor major changes, please open an issue first to discuss what you would like to change.\n\nFor more information, please refer to the [contributing](https://alexandregazagnes.github.io/scikit-transformers/CONTRIBUTING/) page.\n\n\n## License\n\n[GPLv3](https://raw.githubusercontent.com/AlexandreGazagnes/scikit-transformers/main/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexandregazagnes%2Fscikit-transformers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexandregazagnes%2Fscikit-transformers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexandregazagnes%2Fscikit-transformers/lists"}