https://github.com/dholzmueller/probmetrics

Classfication metrics and post-hoc calibration
https://github.com/dholzmueller/probmetrics

calibration classification machine-learning metrics

Last synced: about 2 months ago
JSON representation

Classfication metrics and post-hoc calibration

Host: GitHub
URL: https://github.com/dholzmueller/probmetrics
Owner: dholzmueller
License: apache-2.0
Created: 2025-01-31T14:28:37.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-06-27T22:09:54.000Z (4 months ago)
Last Synced: 2025-07-28T04:07:38.310Z (3 months ago)
Topics: calibration, classification, machine-learning, metrics
Language: Python
Homepage:
Size: 29.3 KB
Stars: 29
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          [![test](https://github.com/dholzmueller/probmetrics/actions/workflows/testing.yml/badge.svg)](https://github.com/dholzmueller/probmetrics/actions/workflows/testing.yml)

[![Downloads](https://img.shields.io/pypi/dm/probmetrics)](https://pypistats.org/packages/probmetrics)

# Probmetrics: Classification metrics and post-hoc calibration

This package (PyTorch-based) currently contains

- classification metrics, especially also 

metrics for assessing the quality of probabilistic predictions, and

- post-hoc calibration methods, especially 

  a fast and accurate implementation of temperature scaling.

It accompanies our paper

[Rethinking Early Stopping: Refine, Then Calibrate](https://arxiv.org/abs/2501.19195).

Please cite our paper if you use this repository for research purposes.

The experiments from the paper can be found here: 

[vision](https://github.com/eugeneberta/RefineThenCalibrate-Vision), 

[tabular](https://github.com/dholzmueller/pytabkit), 

[theory](https://github.com/eugeneberta/RefineThenCalibrate-Theory).

## Installation

Probmetrics is available via

```bash

pip install probmetrics

```

To obtain all functionality, install `probmetrics[extra,dev,dirichletcal]`.

- extra installs more packages for smooth ECE, 

  Venn-Abers calibration, 

  centered isotonic regression, 

  the temperature scaling implementation in NetCal.

- dev installs more packages for development (esp. documentation)

- dirichletcal installs Dirichlet calibration, 

  which however only works for Python 3.12 upwards.

## Using temperature scaling

We provide a highly efficient implementation of temperature scaling

that, unlike some other implementations, 

does not suffer from optimization issues.

### Numpy interface

```python

from probmetrics.calibrators import get_calibrator

import numpy as np

probas = np.asarray([[0.1, 0.9]])  # shape = (n_samples, n_classes)

labels = np.asarray([1])  # shape = (n_samples,)

# this is the version with Laplace smoothing, 

# use calibrate_with_mixture=False (the default) for no Laplace smoothing

calib = get_calibrator('temp-scaling', calibrate_with_mixture=True)

# other option: calib = MixtureCalibrator(TemperatureScalingCalibrator())

# there is also a fit_torch / predict_proba_torch interface

calib.fit(probas, labels)

calibrated_probas = calib.predict_proba(probas)

```

### PyTorch interface

The PyTorch version can be used directly with GPU tensors, but 

this can actually be slower than CPU for smaller validation sets (around 1K-10K samples).

```python

from probmetrics.distributions import CategoricalProbs

from probmetrics.calibrators import get_calibrator

import torch

probas = torch.as_tensor([[0.1, 0.9]])

labels = torch.as_tensor([1])

# temp-scaling with Laplace smoothing

calib = get_calibrator('ts-mix')

# if you have logits, you can use CategoricalLogits instead

calib.fit_torch(CategoricalProbs(probas), labels)

calib.predict_proba_torch(CategoricalProbs(probas))

```

## Using our refinement and calibration metrics

We provide estimators for refinement error 

(loss after post-hoc calibration)

and calibration error 

(loss improvement through post-hoc calibration). 

They can be used as follows:

```python

import torch

from probmetrics.metrics import Metrics

# compute multiple metrics at once 

# this is more efficient than computing them individually

metrics = Metrics.from_names(['logloss', 

                              'refinement_logloss_ts-mix_all', 

                              'calib-err_logloss_ts-mix_all'])

y_true = torch.tensor(...)

y_logits = torch.tensor(...)

results = metrics.compute_all_from_labels_logits(y_true, y_logits)

print(results['refinement_logloss_ts-mix_all'].item())

```

## Using more metrics

In general, while some metrics can be 

flexibly configured using the corresponding classes,

many metrics are available through their name. 

Here are some relevant classification metrics:

```python

from probmetrics.metrics import Metrics

metrics = Metrics.from_names([

    'logloss',

    'brier',  # for binary, this is 2x the brier from sklearn

    'accuracy', 'class-error',

    'auroc-ovr', # one-vs-rest

    'auroc-ovo-sklearn', # one-vs-one (can be slow!)

    # calibration metrics

    'ece-15', 'rmsce-15', 'mce-15', 'smece'

    'refinement_logloss_ts-mix_all', 

    'calib-err_logloss_ts-mix_all',

    'refinement_brier_ts-mix_all', 

    'calib-err_brier_ts-mix_all'

])

```

The following function returns a list of all metric names:

```python

from probmetrics.metrics import Metrics, MetricType

Metrics.get_available_names(metric_type=MetricType.CLASS)

```

While there are some classes for regression metrics, they are not implemented.

## Releases

- v0.0.2:

  - Removed numpy<2.0 constraint, 

  - allow 1D vectors in CategoricalLogits / CategoricalProbs

  - add TorchCal temperature scaling

  - minor fixes in AutoGluon temperature scaling 

    that shouldn't affect the performance in practice

- v0.0.1: Initial release

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dholzmueller/probmetrics

Awesome Lists containing this project

README