https://github.com/caiocarneloz/masksemi
Code for converting a label list in a scikit-like semi-supervised label list.
https://github.com/caiocarneloz/masksemi
machine-learning semi-supervised semi-supervised-learning
Last synced: 8 months ago
JSON representation
Code for converting a label list in a scikit-like semi-supervised label list.
- Host: GitHub
- URL: https://github.com/caiocarneloz/masksemi
- Owner: caiocarneloz
- License: mit
- Created: 2020-02-18T13:23:58.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2021-03-20T02:45:30.000Z (about 5 years ago)
- Last Synced: 2024-04-25T03:42:24.339Z (about 2 years ago)
- Topics: machine-learning, semi-supervised, semi-supervised-learning
- Language: Python
- Size: 4.88 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# masksemi
Code for converting a list of labels in a scikit-like semi-supervised labels.
## Getting Started
#### Dependencies
You need Python 3.7 or later to use **masksemi**. You can find it at [python.org](https://www.python.org/).
You also need numpy package, which is available from [PyPI](https://pypi.org). If you have pip, just run:
```
pip install numpy
```
#### Installation
Clone this repo to your local machine using:
```
git clone https://github.com/caiocarneloz/masksemi.git
```
Or install it using pip:
```
pip install masksemi
```
#### Features
Given the label list and a certain percentage, mask the amount of unlabeled data based on percentage and also encode the data for scikit usage with semi-supervised models. The percentage split is considered by class. This way, all classes will have the given percentage as labeled data.
#### Usage
Considering iris dataset labels:
```
array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
...,
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
...,
'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
'Iris-virginica', 'Iris-virginica', ...], dtype=object)
```
The maskData function is called by passing labels and the percentage of labeled data:
```
masked_labels = maskData(labels, 0.1)
```
The labels are encoded and masked:
```
array([-1, -1, -1, 0, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, -1, -1, -1,
-1, 0, 0, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, 1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, 1, -1, 1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, 2, -1, -1, -1, -1, 2, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 2, 2, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 2])
```