Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucacappelletti94/sanitize_ml_labels
Python package to standardize the names of ML-related metrics, models and losses.
https://github.com/lucacappelletti94/sanitize_ml_labels
labels machine normalization
Last synced: 2 months ago
JSON representation
Python package to standardize the names of ML-related metrics, models and losses.
- Host: GitHub
- URL: https://github.com/lucacappelletti94/sanitize_ml_labels
- Owner: LucaCappelletti94
- License: mit
- Created: 2019-11-17T10:18:32.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2024-10-08T11:13:01.000Z (3 months ago)
- Last Synced: 2024-10-26T16:27:58.950Z (2 months ago)
- Topics: labels, machine, normalization
- Language: Python
- Homepage:
- Size: 6.25 MB
- Stars: 7
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# Sanitize ML Labels
[![PyPI](https://badge.fury.io/py/sanitize-ml-labels.svg)](https://badge.fury.io/py/sanitize-ml-labels)
[![Downloads](https://pepy.tech/badge/sanitize-ml-labels)](https://pepy.tech/badge/sanitize-ml-labels)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/LucaCappelletti94/sanitize_ml_labels/blob/master/LICENSE)
[![CI](https://github.com/LucaCappelletti94/sanitize_ml_labels/actions/workflows/python.yml/badge.svg)](https://github.com/LucaCappelletti94/sanitize_ml_labels/actions)Sanitize ML Labels is a Python package designed to standardize and sanitize ML-related labels. Currently supports over 100 labels, including metric and model names.
If you have ML-related labels, and you find yourself renaming and sanitizing them in a consistent manner, with the proper capitalizaton, this package ensures they are always sanitized in a standard way.
## How do I install this package?
You can install it using pip:
```bash
pip install sanitize_ml_labels
```## Usage examples
Here are some common use cases for normalizing labels:
### Example for metrics
```python
from sanitize_ml_labels import sanitize_ml_labelslabels = [
"acc",
"loss",
"auroc",
"lr"
]assert sanitize_ml_labels(labels) == [
"Accuracy",
"Loss",
"AUROC",
"Learning rate"
]
```### Example for models
```python
from sanitize_ml_labels import sanitize_ml_labelslabels = [
"mlp",
"cnn",
"ffNN",
"Feed-forward neural network",
"perceptron",
"recurrent neural network",
"LStM"
]assert sanitize_ml_labels(labels) == [
"MLP",
"CNN",
"FFNN",
"FFNN",
"Perceptron",
"RNN",
"LSTM"
]assert sanitize_ml_labels("vanilla mlp") == "MLP"
assert sanitize_ml_labels("vanilla cnn") == "CNN"assert sanitize_ml_labels([
"Large Language Model",
"transe",
"Generative Pre-trained Transformer",
"Graph Convolutional Neural Network",
"Convolutional Graph Neural Network",
"Graph Neural Network",
"Graph Attention Network",
"Graph Attention Neural Network",
]) == ["LLM","TransE","GPT","GCN","GCN","GNN","GAT","GAT"]
```Sometimes, it happens that you have prefixed all your models with "vanilla" or "simple" or "basic". This package can help you remove these prefixes.
```python
from sanitize_ml_labels import sanitize_ml_labelslabels = [
"vanilla mlp",
"vanilla cnn",
"vanilla ffnn",
"vanilla perceptron"
]assert sanitize_ml_labels(labels) == ["MLP", "CNN", "FFNN", "Perceptron"]
```## Corner cases
Sometimes, you might encounter hyphenated terms that need to be correctly identified and normalized. We use a heuristic approach based on an [extended list of over 45K hyphenated English words](https://github.com/LucaCappelletti94/sanitize_ml_labels/blob/master/hyphenations.json.gz), originally from the [Metadata consulting website](https://metadataconsulting.blogspot.com/2019/07/An-extensive-massive-near-complete-list-of-all-English-Hyphenated-words.html).
The lookup heuristic, written by [Tommaso Fontana](https://github.com/zommiommy), ensures efficient and accurate hyphenated word recognition.
```python
from sanitize_ml_labels import sanitize_ml_labels# Running the following
assert sanitize_ml_labels("non-existent-edges-in-graph") == "Non-existent edges in graph"
```## Extra utilities
In addition to label sanitization, the package provides methods to check metric normalization:
### Is normalized metric
Validates if a metric falls within the range [0, 1].
```python
from sanitize_ml_labels import is_normalized_metricassert not is_normalized_metric("MSE")
assert is_normalized_metric("acc")
assert is_normalized_metric("accuracy")
assert is_normalized_metric("AUROC")
assert is_normalized_metric("auprc")
```### Is absolutely normalized metric
Validates if a metric falls within the range [-1, 1].
```python
from sanitize_ml_labels import is_absolutely_normalized_metricassert not is_absolutely_normalized_metric("auprc")
assert is_absolutely_normalized_metric("MCC")
assert is_absolutely_normalized_metric("Markedness")
```### Shoud be maximized
Whether a metric should be maximized or minimized. Unknown metrics will raise a `NotImplementedError`.```python
from sanitize_ml_labels import should_be_maximizedassert not should_be_maximized("MSE")
assert should_be_maximized("AUROC")
assert should_be_maximized("accuracy")
```## License
This software is licensed under the MIT license. See the [LICENSE](https://github.com/LucaCappelletti94/sanitize_ml_labels/blob/master/LICENSE).