https://github.com/sibirbil/less

Learning with Subset Stacking
https://github.com/sibirbil/less

classification machine-learning meta-learning regression stacking-ensemble subset-selection

Last synced: 3 months ago
JSON representation

Learning with Subset Stacking

Host: GitHub
URL: https://github.com/sibirbil/less
Owner: sibirbil
License: mit
Created: 2021-11-11T14:18:07.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2025-09-23T12:30:00.000Z (4 months ago)
Last Synced: 2025-10-21T19:59:23.042Z (3 months ago)
Topics: classification, machine-learning, meta-learning, regression, stacking-ensemble, subset-selection
Language: Python
Homepage:
Size: 11.9 MB
Stars: 24
Watchers: 3
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Learning with Subset Stacking (LESS)

LESS is a supervised learning algorithm that is based on training many local estimators on subsets of a given dataset, and then passing their predictions to a global estimator. You can find the details about LESS in our [manuscript](https://arxiv.org/abs/2112.06251).

![LESS](./img/LESS1Level.png)

## Installation

`pip install less-learn`

or

``conda install -c conda-forge less-learn``

(see also [conda-smithy repository](https://github.com/conda-forge/less-learn-feedstock))

## Testing

Here is how you can use LESS:

```python

from sklearn.datasets import make_regression, make_classification

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error, accuracy_score

from less import LESSRegressor, LESSClassifier

### CLASSIFICATION ###

X, y = make_classification(n_samples=1000, n_features=20, n_classes=3, \

                           n_clusters_per_class=2, n_informative=10, random_state=42)

# Train and test split

X_train, X_test, y_train, y_test = \

    train_test_split(X, y, test_size=0.3, random_state=42)

# LESS fit() & predict()

LESS_model = LESSClassifier(random_state=42)

LESS_model.fit(X_train, y_train)

y_pred = LESS_model.predict(X_test)

print('Test accuracy of LESS: {0:.2f}'.format(accuracy_score(y_pred, y_test)))

### REGRESSION ###

X, y = make_regression(n_samples=1000, n_features=20, random_state=42)

# Train and test split

X_train, X_test, y_train, y_test = \

    train_test_split(X, y, test_size=0.3, random_state=42)

# LESS fit() & predict()

LESS_model = LESSRegressor(random_state=42)

LESS_model.fit(X_train, y_train)

y_pred = LESS_model.predict(X_test)

print('Test error of LESS: {0:.2f}'.format(mean_squared_error(y_pred, y_test)))

```

## Tutorials

Our **two-part** [tutorial on Colab](https://colab.research.google.com/drive/183MRHH-i4XT3-HepHbIKVRPiwH7uMzrw?usp=sharing) aims at getting you familiar with LESS **regression**. If you want to try the tutorials on your own computer, then you also need to install the following additional packages: `pandas`, `matplotlib`, and `seaborn`.

## Recommendation

Default implementation of LESS uses Euclidean distances with radial basis function. Therefore, it is a good idea to scale the input data before fitting. This can be done by setting the parameter `scaling` in `LESSRegressor` or `LESSClassifier` to `True` (this is the default value) or by preprocessing the data as follows:

```python

from sklearn.preprocessing import StandardScaler

SC = StandardarScaler()

X_train = SC.fit_transform(X_train)

X_test = SC.transform(X_test)

```

## R Version (outdated)

R implementation of an **older version** of LESS is available in [another repository](https://github.com/sibirbil/LESS-R).

## Citation

Our software can be cited as:

````

  @misc{LESS,

    author = "Ilker Birbil & Samet Copur",

    title = "LESS: LEarning with Subset Stacking",

    year = 2025,

    url = "https://github.com/sibirbil/LESS/"

  }

````

## Changes in v.0.2.0

* Classification is added (`LESSClassifier`)

* Scaling is automatically done as default (`scaling = True`)

* The default global estimator for regression is now `DecisionTreeRegressor` instead of `LinearRegression` (`global_estimator=DecisionTreeRegressor`)

* Warnings can be turned on or off with a flag (`warnings = True`)

## Changes in v.0.3.0

* Typos are corrected

* The hidden class for the binary classifier is now separate

* Local subsets with a single class are handled (the case of `ConstantPredictor`)

---

#### Acknowledgments

We thank Oguz Albayrak for his help with structuring our initial Python scripts.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sibirbil/less

Awesome Lists containing this project

README