Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/34j/sklearn-utilities

Utilities for scikit-learn. Append prediction to x, append prediction to x single, append x prediction to x, compose var estimator, data frame wrapper, drop by noise prediction, drop missing rows y, dummy regressor var, estimator wrapper base, excluded column transformer pandas, feature union pandas, id transformer, included column transformer pand
https://github.com/34j/sklearn-utilities

catboost feature-engine feature-engineering multioutput pandas pca python pytorch regression scikit-learn sklearn sklearn-compatible skorch torch tqdm

Last synced: 2 months ago
JSON representation

Utilities for scikit-learn. Append prediction to x, append prediction to x single, append x prediction to x, compose var estimator, data frame wrapper, drop by noise prediction, drop missing rows y, dummy regressor var, estimator wrapper base, excluded column transformer pandas, feature union pandas, id transformer, included column transformer pand

Awesome Lists containing this project

README

        

# Sklearn Utilities



CI Status


Documentation Status


Test coverage percentage




Poetry


black


pre-commit




PyPI Version

Supported Python versions
License

Utilities for scikit-learn.

## Installation

Install this via pip (or your favourite package manager):

```shell
pip install sklearn-utilities
```

## API

See [Docs](https://sklearn-utilities.readthedocs.io/en/latest/sklearn_utilities.html) for more information.

- `EstimatorWrapperBase`: base class for wrappers. Redirects all attributes which are not in the wrapper to the wrapped estimator.
- `DataFrameWrapper`: tries to convert every estimator output to a pandas DataFrame or Series.
- `FeatureUnionPandas`: a `FeatureUnion` that works with pandas DataFrames.
- `IncludedColumnTransformerPandas`, `ExcludedColumnTransformerPandas`: select columns by name.
- `AppendPredictionToX`: appends the prediction of y to X.
- `AppendXPredictionToX`: appends the prediction of X to X.
- `DropByNoisePrediction`: drops columns which has high importance in predicting noise.
- `DropMissingColumns`: drops columns with missing values above a threshold.
- `DropMissingRowsY`: drops rows with missing values in y. Use `feature_engine.DropMissingData` for X.
- `IntersectXY`: drops rows where the index of X and y do not intersect. Use with `feature_engine.DropMissingData`.
- `ReindexMissingColumns`: reindexes columns of X in `transform()` to match the columns of X in `fit()`.
- `ReportNonFinite`: reports non-finite values in X and/or y.
- `IdTransformer`: a transformer that does nothing.
- `RecursiveFitSubtractRegressor`: a regressor that recursively fits a regressor and subtracts the prediction from the target.
- `SmartMultioutputEstimator`: a `MultiOutputEstimator` that supports tuple of arrays in `predict()` and supports pandas `Series` and `DataFrame`.
- `until_event()`, `since_event()`: calculates the time since or until events (`Series[bool]`)
- `ComposeVarEstimator`: composes mean and std/var estimators.
- `DummyRegressorVar`: `DummyRegressor` that returns 1.0 for std/var.
- `TransformedTargetRegressorVar`: `TransformedTargetRegressor` with std/var support.
- `StandardScalerVar`: `StandardScaler` with std/var support.
- `EvalSetWrapper`, `CatBoostProgressBarWrapper`: wrapper that passes `eval_set` to `fit()` using `train_test_split()`, mainly for `CatBoost`. The latter shows progress bar (using `tqdm`) as well. Useful for early stopping. For LightGBM, see [`lightgbm-callbacks`](https://github.com/34j/lightgbm-callbacks).

### `sklearn_utilities.dataset`

- `add_missing_values()`: adds missing values to a dataset.

### `sklearn_utilities.torch`

- `PCATorch`: faster PCA using PyTorch with GPU support.

#### `sklearn_utilities.torch.skorch`

- `SkorchReshaper`, `SkorchCNNReshaper`: reshapes X and y for `nn.Linear` and `nn.Conv1d/2d` respectively. (For `nn.Conv2d`, uses `np.sliding_window_view()`.)
- `AllowNaN`: wraps a loss module and assign 0 to y and y_hat for indices where y contains NaN in `forward()`..

## See also

- [ml-tooling/best-of-ml-python](https://github.com/ml-tooling/best-of-ml-python)

## Contributors ✨

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):

This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!