Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/inoueakimitsu/clustermil
clustering based multiple instance learning
https://github.com/inoueakimitsu/clustermil
Last synced: about 1 month ago
JSON representation
clustering based multiple instance learning
- Host: GitHub
- URL: https://github.com/inoueakimitsu/clustermil
- Owner: inoueakimitsu
- License: mit
- Created: 2021-10-23T14:42:24.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-07-23T23:58:11.000Z (over 2 years ago)
- Last Synced: 2024-11-13T12:33:29.736Z (about 2 months ago)
- Language: Python
- Size: 13.7 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# clustermil
[![Build Status](https://app.travis-ci.com/inoueakimitsu/clustermil.svg?branch=main)](https://app.travis-ci.com/inoueakimitsu/clustermil)
Python package for multiple instance learning (MIL) for large n_instance dataset.
## Features
- support count-based multiple instance assumptions (see [wikipedia](https://en.wikipedia.org/wiki/Multiple_instance_learning#:~:text=Presence-%2C%20threshold-%2C%20and%20count-based%20assumptions%5Bedit%5D))
- support multi-class setting
- support scikit-learn Clustering algorithms (such as `MiniBatchKMeans`)
- fast even if n_instance is large## Installation
```bash
pip install clustermil
```## Usage
```python
# Prepare follwing dataset
#
# - bags ... list of np.ndarray
# (num_instance_in_the_bag * num_features)
# - lower_threshold ... np.ndarray (num_bags * num_classes)
# - upper_threshold ... np.ndarray (num_bags * num_classes)
#
# bags[i_bag] contains not less than lower_thrshold[i_bag, i_class]
# i_class instances.# Prepare single-instance clustering algorithms
from sklearn.cluster import MiniBatchKMeans
n_clusters = 100
clustering = MiniBatchKMeans(n_clusters=n_clusters)
clusters = clustering.fit_predict(np.vstack(bags)) # flatten bags into instances# Prepare one-hot encoder
from sklearn.preprocessing import OneHotEncoder
onehot_encoder = OneHotEncoder()
onehot_encoder.fit(clusters)# generate ClusterMilClassifier with helper function
from clustermil import generate_mil_classifiermilclassifier = generate_mil_classifier(
clustering,
onehot_encoder,
bags,
lower_threshold,
upper_threshold,
n_clusters)# after multiple instance learning,
# you can predict instance class
milclassifier.predict([instance_feature])
```See `tests/test_classification.py` for an example of a fully working test data generation process.
## License
clustermil is available under the MIT License.