Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/34j/sklearn-utilities
Utilities for scikit-learn. Append prediction to x, append prediction to x single, append x prediction to x, compose var estimator, data frame wrapper, drop by noise prediction, drop missing rows y, dummy regressor var, estimator wrapper base, excluded column transformer pandas, feature union pandas, id transformer, included column transformer pand
https://github.com/34j/sklearn-utilities
catboost feature-engine feature-engineering multioutput pandas pca python pytorch regression scikit-learn sklearn sklearn-compatible skorch torch tqdm
Last synced: 2 months ago
JSON representation
Utilities for scikit-learn. Append prediction to x, append prediction to x single, append x prediction to x, compose var estimator, data frame wrapper, drop by noise prediction, drop missing rows y, dummy regressor var, estimator wrapper base, excluded column transformer pandas, feature union pandas, id transformer, included column transformer pand
- Host: GitHub
- URL: https://github.com/34j/sklearn-utilities
- Owner: 34j
- License: mit
- Created: 2023-10-09T06:45:28.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-15T11:49:35.000Z (2 months ago)
- Last Synced: 2024-10-15T19:21:10.654Z (2 months ago)
- Topics: catboost, feature-engine, feature-engineering, multioutput, pandas, pca, python, pytorch, regression, scikit-learn, sklearn, sklearn-compatible, skorch, torch, tqdm
- Language: Python
- Homepage: https://sklearn-utilities.readthedocs.io/en/latest/sklearn_utilities.html
- Size: 255 KB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Security: .github/SECURITY.md
Awesome Lists containing this project
README
# Sklearn Utilities
Utilities for scikit-learn.
## Installation
Install this via pip (or your favourite package manager):
```shell
pip install sklearn-utilities
```## API
See [Docs](https://sklearn-utilities.readthedocs.io/en/latest/sklearn_utilities.html) for more information.
- `EstimatorWrapperBase`: base class for wrappers. Redirects all attributes which are not in the wrapper to the wrapped estimator.
- `DataFrameWrapper`: tries to convert every estimator output to a pandas DataFrame or Series.
- `FeatureUnionPandas`: a `FeatureUnion` that works with pandas DataFrames.
- `IncludedColumnTransformerPandas`, `ExcludedColumnTransformerPandas`: select columns by name.
- `AppendPredictionToX`: appends the prediction of y to X.
- `AppendXPredictionToX`: appends the prediction of X to X.
- `DropByNoisePrediction`: drops columns which has high importance in predicting noise.
- `DropMissingColumns`: drops columns with missing values above a threshold.
- `DropMissingRowsY`: drops rows with missing values in y. Use `feature_engine.DropMissingData` for X.
- `IntersectXY`: drops rows where the index of X and y do not intersect. Use with `feature_engine.DropMissingData`.
- `ReindexMissingColumns`: reindexes columns of X in `transform()` to match the columns of X in `fit()`.
- `ReportNonFinite`: reports non-finite values in X and/or y.
- `IdTransformer`: a transformer that does nothing.
- `RecursiveFitSubtractRegressor`: a regressor that recursively fits a regressor and subtracts the prediction from the target.
- `SmartMultioutputEstimator`: a `MultiOutputEstimator` that supports tuple of arrays in `predict()` and supports pandas `Series` and `DataFrame`.
- `until_event()`, `since_event()`: calculates the time since or until events (`Series[bool]`)
- `ComposeVarEstimator`: composes mean and std/var estimators.
- `DummyRegressorVar`: `DummyRegressor` that returns 1.0 for std/var.
- `TransformedTargetRegressorVar`: `TransformedTargetRegressor` with std/var support.
- `StandardScalerVar`: `StandardScaler` with std/var support.
- `EvalSetWrapper`, `CatBoostProgressBarWrapper`: wrapper that passes `eval_set` to `fit()` using `train_test_split()`, mainly for `CatBoost`. The latter shows progress bar (using `tqdm`) as well. Useful for early stopping. For LightGBM, see [`lightgbm-callbacks`](https://github.com/34j/lightgbm-callbacks).### `sklearn_utilities.dataset`
- `add_missing_values()`: adds missing values to a dataset.
### `sklearn_utilities.torch`
- `PCATorch`: faster PCA using PyTorch with GPU support.
#### `sklearn_utilities.torch.skorch`
- `SkorchReshaper`, `SkorchCNNReshaper`: reshapes X and y for `nn.Linear` and `nn.Conv1d/2d` respectively. (For `nn.Conv2d`, uses `np.sliding_window_view()`.)
- `AllowNaN`: wraps a loss module and assign 0 to y and y_hat for indices where y contains NaN in `forward()`..## See also
- [ml-tooling/best-of-ml-python](https://github.com/ml-tooling/best-of-ml-python)
## Contributors ✨
Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!