{"id":15698634,"url":"https://github.com/lucacappelletti94/sanitize_ml_labels","last_synced_at":"2025-05-06T21:07:29.483Z","repository":{"id":47794189,"uuid":"222229982","full_name":"LucaCappelletti94/sanitize_ml_labels","owner":"LucaCappelletti94","description":"Python package to standardize the names of ML-related metrics, models and losses.","archived":false,"fork":false,"pushed_at":"2024-10-28T14:53:38.000Z","size":6561,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-06T21:07:12.686Z","etag":null,"topics":["labels","machine","normalization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LucaCappelletti94.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"LucaCappelletti94"}},"created_at":"2019-11-17T10:18:32.000Z","updated_at":"2024-10-28T14:53:43.000Z","dependencies_parsed_at":"2024-02-01T10:31:16.896Z","dependency_job_id":"ac2ea92a-209e-4e7c-a199-3501c79fc07e","html_url":"https://github.com/LucaCappelletti94/sanitize_ml_labels","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucaCappelletti94%2Fsanitize_ml_labels","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucaCappelletti94%2Fsanitize_ml_labels/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucaCappelletti94%2Fsanitize_ml_labels/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucaCappelletti94%2Fsanitize_ml_labels/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LucaCappelletti94","download_url":"https://codeload.github.com/LucaCappelletti94/sanitize_ml_labels/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252769420,"owners_count":21801378,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["labels","machine","normalization"],"created_at":"2024-10-03T19:31:30.674Z","updated_at":"2025-05-06T21:07:29.466Z","avatar_url":"https://github.com/LucaCappelletti94.png","language":"Python","readme":"# Sanitize ML Labels\n\n[![PyPI](https://badge.fury.io/py/sanitize-ml-labels.svg)](https://badge.fury.io/py/sanitize-ml-labels)\n[![Downloads](https://pepy.tech/badge/sanitize-ml-labels)](https://pepy.tech/badge/sanitize-ml-labels)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/LucaCappelletti94/sanitize_ml_labels/blob/master/LICENSE)\n[![CI](https://github.com/LucaCappelletti94/sanitize_ml_labels/actions/workflows/python.yml/badge.svg)](https://github.com/LucaCappelletti94/sanitize_ml_labels/actions)\n\nSanitize ML Labels is a Python package designed to standardize and sanitize ML-related labels. Currently supports over 100 labels, including metric and model names.\n\nIf you have ML-related labels, and you find yourself renaming and sanitizing them in a consistent manner, with the proper capitalizaton, this package ensures they are always sanitized in a standard way.\n\n## How do I install this package?\n\nYou can install it using pip:\n\n```bash\npip install sanitize_ml_labels\n```\n\n## Usage examples\n\nHere are some common use cases for normalizing labels:\n\n### Example for metrics\n\n```python\nfrom sanitize_ml_labels import sanitize_ml_labels\n\nlabels = [\n    \"acc\",\n    \"loss\",\n    \"auroc\",\n    \"lr\"\n]\n\nassert sanitize_ml_labels(labels) == [\n    \"Accuracy\",\n    \"Loss\",\n    \"AUROC\",\n    \"Learning rate\"\n]\n```\n\n### Example for models\n\n```python\nfrom sanitize_ml_labels import sanitize_ml_labels\n\nlabels = [\n    \"mlp\",\n    \"cnn\",\n    \"ffNN\",\n    \"Feed-forward neural network\",\n    \"perceptron\",\n    \"recurrent neural network\",\n    \"LStM\"\n]\n\nassert sanitize_ml_labels(labels) == [\n    \"MLP\",\n    \"CNN\",\n    \"FFNN\",\n    \"FFNN\",\n    \"Perceptron\",\n    \"RNN\",\n    \"LSTM\"\n]\n\nassert sanitize_ml_labels(\"vanilla mlp\") == \"MLP\"\nassert sanitize_ml_labels(\"vanilla cnn\") == \"CNN\"\n\nassert sanitize_ml_labels([\n    \"Large Language Model\",\n    \"transe\",\n    \"Generative Pre-trained Transformer\",\n    \"Graph Convolutional Neural Network\",\n    \"Convolutional Graph Neural Network\",\n    \"Graph Neural Network\",\n    \"Graph Attention Network\",\n    \"Graph Attention Neural Network\",\n]) == [\"LLM\",\"TransE\",\"GPT\",\"GCN\",\"GCN\",\"GNN\",\"GAT\",\"GAT\"]\n```\n\nSometimes, it happens that you have prefixed all your models with \"vanilla\" or \"simple\" or \"basic\". This package can help you remove these prefixes.\n\n```python\nfrom sanitize_ml_labels import sanitize_ml_labels\n\nlabels = [\n    \"vanilla mlp\",\n    \"vanilla cnn\",\n    \"vanilla ffnn\",\n    \"vanilla perceptron\"\n]\n\nassert sanitize_ml_labels(labels) == [\"MLP\", \"CNN\", \"FFNN\", \"Perceptron\"]\n```\n\n## Corner cases\n\nSometimes, you might encounter hyphenated terms that need to be correctly identified and normalized. We use a heuristic approach based on an [extended list of over 45K hyphenated English words](https://github.com/LucaCappelletti94/sanitize_ml_labels/blob/master/hyphenations.json.gz), originally from the [Metadata consulting website](https://metadataconsulting.blogspot.com/2019/07/An-extensive-massive-near-complete-list-of-all-English-Hyphenated-words.html).\n\nThe lookup heuristic, written by [Tommaso Fontana](https://github.com/zommiommy), ensures efficient and accurate hyphenated word recognition.\n\n```python\nfrom sanitize_ml_labels import sanitize_ml_labels\n\n# Running the following\nassert sanitize_ml_labels(\"non-existent-edges-in-graph\") == \"Non-existent edges in graph\"\n```\n\n## Extra utilities\n\nIn addition to label sanitization, the package provides methods to check metric normalization:\n\n### Is normalized metric\n\nValidates if a metric falls within the range [0, 1].\n\n```python\nfrom sanitize_ml_labels import is_normalized_metric\n\nassert not is_normalized_metric(\"MSE\")\nassert is_normalized_metric(\"acc\")\nassert is_normalized_metric(\"accuracy\")\nassert is_normalized_metric(\"AUROC\")\nassert is_normalized_metric(\"auprc\")\n```\n\n### Is absolutely normalized metric\n\nValidates if a metric falls within the range [-1, 1].\n\n```python\nfrom sanitize_ml_labels import is_absolutely_normalized_metric\n\nassert not is_absolutely_normalized_metric(\"auprc\")\nassert is_absolutely_normalized_metric(\"MCC\")\nassert is_absolutely_normalized_metric(\"Markedness\")\n```\n\n### Shoud be maximized\nWhether a metric should be maximized or minimized. Unknown metrics will raise a `NotImplementedError`.\n\n```python\nfrom sanitize_ml_labels import should_be_maximized\n\nassert not should_be_maximized(\"MSE\")\nassert should_be_maximized(\"AUROC\")\nassert should_be_maximized(\"accuracy\")\n```\n\n## License\nThis software is licensed under the MIT license. See the [LICENSE](https://github.com/LucaCappelletti94/sanitize_ml_labels/blob/master/LICENSE).\n","funding_links":["https://github.com/sponsors/LucaCappelletti94"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucacappelletti94%2Fsanitize_ml_labels","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucacappelletti94%2Fsanitize_ml_labels","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucacappelletti94%2Fsanitize_ml_labels/lists"}