Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/glemaitre/sklearn_compat

Last synced: 29 days ago
JSON representation

Host: GitHub
URL: https://github.com/glemaitre/sklearn_compat
Owner: glemaitre
License: bsd-3-clause
Created: 2024-11-20T22:21:36.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2024-11-21T12:12:54.000Z (about 1 month ago)
Last Synced: 2024-11-21T13:22:26.348Z (about 1 month ago)
Language: Python
Size: 37.1 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        ## Ease multi-version support for scikit-learn compatible library

[![SPEC 0 — Minimum Supported Dependencies](https://img.shields.io/badge/SPEC-0-green?labelColor=%23004811&color=%235CA038)](https://scientific-python.org/specs/spec-0000/)

`sklearn_compat` is a small Python package that allows you to support new

scikit-learn features with older versions of scikit-learn.

The aim is to support a range of scikit-learn versions as specified in the

[SPEC0](https://scientific-python.org/specs/spec-0000/). It means that you will find

utility to support the last 4 released version of scikit-learn.

## How to adapt your scikit-learn code

### Upgrading to scikit-learn 1.6

#### `validate_data`

Your previous code could have looked like this:

```python

class MyEstimator(BaseEstimator):

    def fit(self, X, y=None):

        X = self._validate_data(X, force_all_finite=True)

        return self

```

There is two major changes in scikit-learn 1.6:

- `validate_data` has been moved to `sklearn.utils.validation`.

- `force_all_finite` is deprecated in favor of the `ensure_all_finite` parameter.

You can now use the following code for backward compatibility:

```python

from sklearn_compat.utils.validation import validate_data

class MyEstimator(BaseEstimator):

    def fit(self, X, y=None):

        X = validate_data(self, X=X, ensure_all_finite=True)

        return self

```

#### `_check_n_features` and `_check_feature_names`

Similarly to `validate_data`, these two functions have been moved to

`sklearn.utils.validation` instead of being methods of the estimators. So the following

code:

```python

class MyEstimator(BaseEstimator):

    def fit(self, X, y=None):

        self._check_n_features(X, reset=True)

        self._check_feature_names(X, reset=True)

        return self

```

becomes:

```python

from sklearn_compat.utils.validation import _check_n_features, _check_feature_names

class MyEstimator(BaseEstimator):

    def fit(self, X, y=None):

        _check_n_features(self, X, reset=True)

        _check_feature_names(self, X, reset=True)

        return self

```

### Upgrading to scikit-learn 1.5

#### `_get_column_indices`

The utility function `_get_column_indices` has been moved from `sklearn.utils` to

`sklearn.utils._indexing`.

So the following code:

```python

import pandas as pd

from sklearn.utils import _get_column_indices

df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

_get_column_indices(df, key="b")

```

becomes:

```python

import pandas as pd

from sklearn_compat.utils._indexing import _get_column_indices

df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

_get_column_indices(df, key="b")

```

#### `_print_elapsed_time`

The function `_print_elapsed_time` has been moved from `sklearn.utils` to

`sklearn.utils._user_interface`.

So the following code:

```python

from sklearn.utils import _print_elapsed_time

with _print_elapsed_time("sklearn_compat", "testing"):

    time.sleep(0.1)

```

becomes:

```python

from sklearn_compat.utils._user_interface import _print_elapsed_time

with _print_elapsed_time("sklearn_compat", "testing"):

    time.sleep(0.1)

```

### Upgrading to scikit-learn 1.4

### Support for metadata routing

TODO

### Upgrading to scikit-learn 1.3

#### Parameter validation

scikit-learn 1.3 introduced a new way to validate the parameters is a consistent manner.

One could in the past define the following class:

```python

class MyEstimator(BaseEstimator):

    def __init__(self, a=1):

        self.a = a

    def fit(self, X, y=None):

        if self.a < 0:

            raise ValueError("a must be positive")

        return self

```

becomes:

```python

from sklearn_compat.base import _ParamsValidationMixin

class MyEstimator(_ParamsValidationMixin, BaseEstimator):

    _parameter_constraints = {"a": [Interval(Integral, 0, None, closed="left")]}

    def __init__(self, a=1):

        self.a = a

    @_fit_context(prefer_skip_nested_validation=True)

    def fit(self, X, y=None):

        return self

```

The advantage is that the error raised will be more informative and consistent across

estimators. Also, we have the possibility to skip the validation of the parameters when

using this estimator as a meta-estimator.