https://github.com/trent-b/iterative-stratification

scikit-learn cross validators for iterative stratification of multilabel data
https://github.com/trent-b/iterative-stratification

cross-validation multilabel multilabel-classification scikit-learn stratification

Last synced: 6 months ago
JSON representation

scikit-learn cross validators for iterative stratification of multilabel data

Host: GitHub
URL: https://github.com/trent-b/iterative-stratification
Owner: trent-b
License: bsd-3-clause
Created: 2018-02-04T00:32:10.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2024-10-12T16:36:02.000Z (about 1 year ago)
Last Synced: 2025-05-12T06:04:39.966Z (6 months ago)
Topics: cross-validation, multilabel, multilabel-classification, scikit-learn, stratification
Language: Python
Size: 45.9 KB
Stars: 864
Watchers: 6
Forks: 74
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - trent-b/iterative-stratification - stratification 是一个为多标签数据提供分层交叉验证器的 scikit-learn兼容项目。它扩展了 scikit-learn 的交叉验证器，使其能够对多标签数据进行分层，并提供 MultilabelStratifiedKFold、RepeatedMultilabelStratifiedKFold 和 MultilabelStratifiedShuffleSplit等实现，其分层算法基于 Sechidis 等人 (2011) 的论文。该项目支持 Python 3.4 到 3.9，依赖于 scipy、numpy 和 scikit-learn。用户可以通过 pip 安装该项目，并像使用其他交叉验证器一样使用其提供的多标签交叉验证器，例如与 cross_val_score 或 cross_val_predict 一起使用。 (其他_机器学习与深度学习)
awesome-python-machine-learning-resources - GitHub - 5% open · ⏱️ 06.06.2022): (Sklearn实用程序)

README

          
[![Build Status](https://travis-ci.org/vfdev-5/iterative-stratification.svg?branch=master)](https://travis-ci.org/vfdev-5/iterative-stratification)

[![Coverage Status](https://coveralls.io/repos/github/vfdev-5/iterative-stratification/badge.svg?branch=master)](https://coveralls.io/github/vfdev-5/iterative-stratification?branch=master)

# iterative-stratification

iterative-stratification is a project that provides [scikit-learn](http://scikit-learn.org/) compatible cross validators with stratification for multilabel data.

Presently scikit-learn provides several cross validators with stratification. However, these cross validators do not offer the ability to stratify _multilabel_ data. This iterative-stratification project offers implementations of MultilabelStratifiedKFold, MultilabelRepeatedStratifiedKFold, and MultilabelStratifiedShuffleSplit with a base algorithm for stratifying multilabel data described in the following paper:

Sechidis K., Tsoumakas G., Vlahavas I. (2011) On the Stratification of Multi-Label Data. In: Gunopulos D., Hofmann T., Malerba D., Vazirgiannis M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science, vol 6913. Springer, Berlin, Heidelberg.

## Requirements

iterative-stratification has been tested under Python 3.4 through 3.9 with the following dependencies:

- scipy(>=0.13.3)

- numpy(>=1.8.2)

- scikit-learn(>=0.19.0)

## Installation

iterative-stratification is currently available on the PyPi repository and can be installed via pip:

```

pip install iterative-stratification

```

## Toy Examples

The multilabel cross validators that this package provides may be used with the scikit-learn API in the same manner as any other cross validators. For example, these cross validators may be passed to [cross_val_score](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html) or [cross_val_predict](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html). Below are some toy examples of the direct use of the multilabel cross validators.

### MultilabelStratifiedKFold

```python

from iterstrat.ml_stratifiers import MultilabelStratifiedKFold

import numpy as np

X = np.array([[1,2], [3,4], [1,2], [3,4], [1,2], [3,4], [1,2], [3,4]])

y = np.array([[0,0], [0,0], [0,1], [0,1], [1,1], [1,1], [1,0], [1,0]])

mskf = MultilabelStratifiedKFold(n_splits=2, shuffle=True, random_state=0)

for train_index, test_index in mskf.split(X, y):

   print("TRAIN:", train_index, "TEST:", test_index)

   X_train, X_test = X[train_index], X[test_index]

   y_train, y_test = y[train_index], y[test_index]

```

Output:

```

TRAIN: [0 3 4 6] TEST: [1 2 5 7]

TRAIN: [1 2 5 7] TEST: [0 3 4 6]

```

### RepeatedMultilabelStratifiedKFold

```python

from iterstrat.ml_stratifiers import RepeatedMultilabelStratifiedKFold

import numpy as np

X = np.array([[1,2], [3,4], [1,2], [3,4], [1,2], [3,4], [1,2], [3,4]])

y = np.array([[0,0], [0,0], [0,1], [0,1], [1,1], [1,1], [1,0], [1,0]])

rmskf = RepeatedMultilabelStratifiedKFold(n_splits=2, n_repeats=2, random_state=0)

for train_index, test_index in rmskf.split(X, y):

   print("TRAIN:", train_index, "TEST:", test_index)

   X_train, X_test = X[train_index], X[test_index]

   y_train, y_test = y[train_index], y[test_index]

```

Output:

```

TRAIN: [0 3 4 6] TEST: [1 2 5 7]

TRAIN: [1 2 5 7] TEST: [0 3 4 6]

TRAIN: [0 1 4 5] TEST: [2 3 6 7]

TRAIN: [2 3 6 7] TEST: [0 1 4 5]

```

### MultilabelStratifiedShuffleSplit

```python

from iterstrat.ml_stratifiers import MultilabelStratifiedShuffleSplit

import numpy as np

X = np.array([[1,2], [3,4], [1,2], [3,4], [1,2], [3,4], [1,2], [3,4]])

y = np.array([[0,0], [0,0], [0,1], [0,1], [1,1], [1,1], [1,0], [1,0]])

msss = MultilabelStratifiedShuffleSplit(n_splits=3, test_size=0.5, random_state=0)

for train_index, test_index in msss.split(X, y):

   print("TRAIN:", train_index, "TEST:", test_index)

   X_train, X_test = X[train_index], X[test_index]

   y_train, y_test = y[train_index], y[test_index]

```

Output:

```

TRAIN: [1 2 5 7] TEST: [0 3 4 6]

TRAIN: [2 3 6 7] TEST: [0 1 4 5]

TRAIN: [1 2 5 6] TEST: [0 3 4 7]

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/trent-b/iterative-stratification

Awesome Lists containing this project

README