{"id":15288215,"url":"https://github.com/34j/sklearn-utilities","last_synced_at":"2025-04-13T07:35:13.767Z","repository":{"id":200256539,"uuid":"702351075","full_name":"34j/sklearn-utilities","owner":"34j","description":"Utilities for scikit-learn. Append prediction to x, append prediction to x single, append x prediction to x, compose var estimator, data frame wrapper, drop by noise prediction, drop missing rows y, dummy regressor var, estimator wrapper base, excluded column transformer pandas, feature union pandas, id transformer, included column transformer pand","archived":false,"fork":false,"pushed_at":"2024-11-18T17:08:37.000Z","size":461,"stargazers_count":3,"open_issues_count":12,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-24T16:48:18.065Z","etag":null,"topics":["catboost","feature-engine","feature-engineering","multioutput","pandas","pca","python","pytorch","regression","scikit-learn","sklearn","sklearn-compatible","skorch","torch","tqdm"],"latest_commit_sha":null,"homepage":"https://sklearn-utilities.readthedocs.io/en/latest/sklearn_utilities.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/34j.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":".github/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["34j"]}},"created_at":"2023-10-09T06:45:28.000Z","updated_at":"2024-11-15T00:26:34.000Z","dependencies_parsed_at":"2023-11-13T18:25:47.889Z","dependency_job_id":"26d810ed-c28f-4e63-8578-d05bc3ec5ef9","html_url":"https://github.com/34j/sklearn-utilities","commit_stats":{"total_commits":82,"total_committers":4,"mean_commits":20.5,"dds":"0.35365853658536583","last_synced_commit":"2722b3229cc18c2af0e05809490a5b156665fc8c"},"previous_names":["34j/sklearn-utilities"],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/34j%2Fsklearn-utilities","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/34j%2Fsklearn-utilities/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/34j%2Fsklearn-utilities/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/34j%2Fsklearn-utilities/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/34j","download_url":"https://codeload.github.com/34j/sklearn-utilities/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242631140,"owners_count":20160830,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["catboost","feature-engine","feature-engineering","multioutput","pandas","pca","python","pytorch","regression","scikit-learn","sklearn","sklearn-compatible","skorch","torch","tqdm"],"created_at":"2024-09-30T15:44:46.268Z","updated_at":"2025-03-09T01:32:00.519Z","avatar_url":"https://github.com/34j.png","language":"Python","funding_links":["https://github.com/sponsors/34j"],"categories":[],"sub_categories":[],"readme":"# Sklearn Utilities\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/34j/sklearn-utilities/actions/workflows/ci.yml?query=branch%3Amain\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/actions/workflow/status/34j/sklearn-utilities/ci.yml?branch=main\u0026label=CI\u0026logo=github\u0026style=flat-square\" alt=\"CI Status\" \u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://sklearn-utilities.readthedocs.io\"\u003e\n    \u003cimg src=\"https://img.shields.io/readthedocs/sklearn-utilities.svg?logo=read-the-docs\u0026logoColor=fff\u0026style=flat-square\" alt=\"Documentation Status\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://codecov.io/gh/34j/sklearn-utilities\"\u003e\n    \u003cimg src=\"https://img.shields.io/codecov/c/github/34j/sklearn-utilities.svg?logo=codecov\u0026logoColor=fff\u0026style=flat-square\" alt=\"Test coverage percentage\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://python-poetry.org/\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/packaging-poetry-299bd7?style=flat-square\u0026logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA4AAAASCAYAAABrXO8xAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAJJSURBVHgBfZLPa1NBEMe/s7tNXoxW1KJQKaUHkXhQvHgW6UHQQ09CBS/6V3hKc/AP8CqCrUcpmop3Cx48eDB4yEECjVQrlZb80CRN8t6OM/teagVxYZi38+Yz853dJbzoMV3MM8cJUcLMSUKIE8AzQ2PieZzFxEJOHMOgMQQ+dUgSAckNXhapU/NMhDSWLs1B24A8sO1xrN4NECkcAC9ASkiIJc6k5TRiUDPhnyMMdhKc+Zx19l6SgyeW76BEONY9exVQMzKExGKwwPsCzza7KGSSWRWEQhyEaDXp6ZHEr416ygbiKYOd7TEWvvcQIeusHYMJGhTwF9y7sGnSwaWyFAiyoxzqW0PM/RjghPxF2pWReAowTEXnDh0xgcLs8l2YQmOrj3N7ByiqEoH0cARs4u78WgAVkoEDIDoOi3AkcLOHU60RIg5wC4ZuTC7FaHKQm8Hq1fQuSOBvX/sodmNJSB5geaF5CPIkUeecdMxieoRO5jz9bheL6/tXjrwCyX/UYBUcjCaWHljx1xiX6z9xEjkYAzbGVnB8pvLmyXm9ep+W8CmsSHQQY77Zx1zboxAV0w7ybMhQmfqdmmw3nEp1I0Z+FGO6M8LZdoyZnuzzBdjISicKRnpxzI9fPb+0oYXsNdyi+d3h9bm9MWYHFtPeIZfLwzmFDKy1ai3p+PDls1Llz4yyFpferxjnyjJDSEy9CaCx5m2cJPerq6Xm34eTrZt3PqxYO1XOwDYZrFlH1fWnpU38Y9HRze3lj0vOujZcXKuuXm3jP+s3KbZVra7y2EAAAAAASUVORK5CYII=\" alt=\"Poetry\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/ambv/black\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square\" alt=\"black\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/pre-commit/pre-commit\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=white\u0026style=flat-square\" alt=\"pre-commit\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pypi.org/project/sklearn-utilities/\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/v/sklearn-utilities.svg?logo=python\u0026logoColor=fff\u0026style=flat-square\" alt=\"PyPI Version\"\u003e\n  \u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/pypi/pyversions/sklearn-utilities.svg?style=flat-square\u0026logo=python\u0026amp;logoColor=fff\" alt=\"Supported Python versions\"\u003e\n  \u003cimg src=\"https://img.shields.io/pypi/l/sklearn-utilities.svg?style=flat-square\" alt=\"License\"\u003e\n\u003c/p\u003e\n\nUtilities for scikit-learn.\n\n## Installation\n\nInstall this via pip (or your favourite package manager):\n\n```shell\npip install sklearn-utilities\n```\n\n## API\n\nSee [Docs](https://sklearn-utilities.readthedocs.io/en/latest/sklearn_utilities.html) for more information.\n\n- `EstimatorWrapperBase`: base class for wrappers. Redirects all attributes which are not in the wrapper to the wrapped estimator.\n- `DataFrameWrapper`: tries to convert every estimator output to a pandas DataFrame or Series.\n- `FeatureUnionPandas`: a `FeatureUnion` that works with pandas DataFrames.\n- `IncludedColumnTransformerPandas`, `ExcludedColumnTransformerPandas`: select columns by name.\n- `AppendPredictionToX`: appends the prediction of y to X.\n- `AppendXPredictionToX`: appends the prediction of X to X.\n- `DropByNoisePrediction`: drops columns which has high importance in predicting noise.\n- `DropMissingColumns`: drops columns with missing values above a threshold.\n- `DropMissingRowsY`: drops rows with missing values in y. Use `feature_engine.DropMissingData` for X.\n- `IntersectXY`: drops rows where the index of X and y do not intersect. Use with `feature_engine.DropMissingData`.\n- `ReindexMissingColumns`: reindexes columns of X in `transform()` to match the columns of X in `fit()`.\n- `ReportNonFinite`: reports non-finite values in X and/or y.\n- `IdTransformer`: a transformer that does nothing.\n- `RecursiveFitSubtractRegressor`: a regressor that recursively fits a regressor and subtracts the prediction from the target.\n- `SmartMultioutputEstimator`: a `MultiOutputEstimator` that supports tuple of arrays in `predict()` and supports pandas `Series` and `DataFrame`.\n- `until_event()`, `since_event()`: calculates the time since or until events (`Series[bool]`)\n- `ComposeVarEstimator`: composes mean and std/var estimators.\n- `DummyRegressorVar`: `DummyRegressor` that returns 1.0 for std/var.\n- `TransformedTargetRegressorVar`: `TransformedTargetRegressor` with std/var support.\n- `StandardScalerVar`: `StandardScaler` with std/var support.\n- `EvalSetWrapper`, `CatBoostProgressBarWrapper`: wrapper that passes `eval_set` to `fit()` using `train_test_split()`, mainly for `CatBoost`. The latter shows progress bar (using `tqdm`) as well. Useful for early stopping. For LightGBM, see [`lightgbm-callbacks`](https://github.com/34j/lightgbm-callbacks).\n\n### `sklearn_utilities.dataset`\n\n- `add_missing_values()`: adds missing values to a dataset.\n\n### `sklearn_utilities.torch`\n\n- `PCATorch`: faster PCA using PyTorch with GPU support.\n\n#### `sklearn_utilities.torch.skorch`\n\n- `SkorchReshaper`, `SkorchCNNReshaper`: reshapes X and y for `nn.Linear` and `nn.Conv1d/2d` respectively. (For `nn.Conv2d`, uses `np.sliding_window_view()`.)\n- `AllowNaN`: wraps a loss module and assign 0 to y and y_hat for indices where y contains NaN in `forward()`..\n\n## See also\n\n- [ml-tooling/best-of-ml-python](https://github.com/ml-tooling/best-of-ml-python)\n\n## Contributors ✨\n\nThanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):\n\n\u003c!-- prettier-ignore-start --\u003e\n\u003c!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section --\u003e\n\u003c!-- markdownlint-disable --\u003e\n\u003c!-- markdownlint-enable --\u003e\n\u003c!-- ALL-CONTRIBUTORS-LIST:END --\u003e\n\u003c!-- prettier-ignore-end --\u003e\n\nThis project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F34j%2Fsklearn-utilities","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F34j%2Fsklearn-utilities","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F34j%2Fsklearn-utilities/lists"}