https://github.com/dholzmueller/probmetrics
Classfication metrics and post-hoc calibration
https://github.com/dholzmueller/probmetrics
calibration classification machine-learning metrics
Last synced: about 2 months ago
JSON representation
Classfication metrics and post-hoc calibration
- Host: GitHub
- URL: https://github.com/dholzmueller/probmetrics
- Owner: dholzmueller
- License: apache-2.0
- Created: 2025-01-31T14:28:37.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-06-27T22:09:54.000Z (4 months ago)
- Last Synced: 2025-07-28T04:07:38.310Z (3 months ago)
- Topics: calibration, classification, machine-learning, metrics
- Language: Python
- Homepage:
- Size: 29.3 KB
- Stars: 29
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
[](https://github.com/dholzmueller/probmetrics/actions/workflows/testing.yml)
[](https://pypistats.org/packages/probmetrics)# Probmetrics: Classification metrics and post-hoc calibration
This package (PyTorch-based) currently contains
- classification metrics, especially also
metrics for assessing the quality of probabilistic predictions, and
- post-hoc calibration methods, especially
a fast and accurate implementation of temperature scaling.It accompanies our paper
[Rethinking Early Stopping: Refine, Then Calibrate](https://arxiv.org/abs/2501.19195).
Please cite our paper if you use this repository for research purposes.
The experiments from the paper can be found here:
[vision](https://github.com/eugeneberta/RefineThenCalibrate-Vision),
[tabular](https://github.com/dholzmueller/pytabkit),
[theory](https://github.com/eugeneberta/RefineThenCalibrate-Theory).## Installation
Probmetrics is available via
```bash
pip install probmetrics
```
To obtain all functionality, install `probmetrics[extra,dev,dirichletcal]`.
- extra installs more packages for smooth ECE,
Venn-Abers calibration,
centered isotonic regression,
the temperature scaling implementation in NetCal.
- dev installs more packages for development (esp. documentation)
- dirichletcal installs Dirichlet calibration,
which however only works for Python 3.12 upwards.## Using temperature scaling
We provide a highly efficient implementation of temperature scaling
that, unlike some other implementations,
does not suffer from optimization issues.### Numpy interface
```python
from probmetrics.calibrators import get_calibrator
import numpy as npprobas = np.asarray([[0.1, 0.9]]) # shape = (n_samples, n_classes)
labels = np.asarray([1]) # shape = (n_samples,)
# this is the version with Laplace smoothing,
# use calibrate_with_mixture=False (the default) for no Laplace smoothing
calib = get_calibrator('temp-scaling', calibrate_with_mixture=True)
# other option: calib = MixtureCalibrator(TemperatureScalingCalibrator())
# there is also a fit_torch / predict_proba_torch interface
calib.fit(probas, labels)
calibrated_probas = calib.predict_proba(probas)
```### PyTorch interface
The PyTorch version can be used directly with GPU tensors, but
this can actually be slower than CPU for smaller validation sets (around 1K-10K samples).```python
from probmetrics.distributions import CategoricalProbs
from probmetrics.calibrators import get_calibrator
import torchprobas = torch.as_tensor([[0.1, 0.9]])
labels = torch.as_tensor([1])# temp-scaling with Laplace smoothing
calib = get_calibrator('ts-mix')# if you have logits, you can use CategoricalLogits instead
calib.fit_torch(CategoricalProbs(probas), labels)
calib.predict_proba_torch(CategoricalProbs(probas))
```## Using our refinement and calibration metrics
We provide estimators for refinement error
(loss after post-hoc calibration)
and calibration error
(loss improvement through post-hoc calibration).
They can be used as follows:```python
import torch
from probmetrics.metrics import Metrics# compute multiple metrics at once
# this is more efficient than computing them individually
metrics = Metrics.from_names(['logloss',
'refinement_logloss_ts-mix_all',
'calib-err_logloss_ts-mix_all'])
y_true = torch.tensor(...)
y_logits = torch.tensor(...)
results = metrics.compute_all_from_labels_logits(y_true, y_logits)
print(results['refinement_logloss_ts-mix_all'].item())
```## Using more metrics
In general, while some metrics can be
flexibly configured using the corresponding classes,
many metrics are available through their name.
Here are some relevant classification metrics:
```python
from probmetrics.metrics import Metricsmetrics = Metrics.from_names([
'logloss',
'brier', # for binary, this is 2x the brier from sklearn
'accuracy', 'class-error',
'auroc-ovr', # one-vs-rest
'auroc-ovo-sklearn', # one-vs-one (can be slow!)
# calibration metrics
'ece-15', 'rmsce-15', 'mce-15', 'smece'
'refinement_logloss_ts-mix_all',
'calib-err_logloss_ts-mix_all',
'refinement_brier_ts-mix_all',
'calib-err_brier_ts-mix_all'
])
```The following function returns a list of all metric names:
```python
from probmetrics.metrics import Metrics, MetricType
Metrics.get_available_names(metric_type=MetricType.CLASS)
```While there are some classes for regression metrics, they are not implemented.
## Releases
- v0.0.2:
- Removed numpy<2.0 constraint,
- allow 1D vectors in CategoricalLogits / CategoricalProbs
- add TorchCal temperature scaling
- minor fixes in AutoGluon temperature scaling
that shouldn't affect the performance in practice
- v0.0.1: Initial release