https://github.com/mertyg/beyond-confidence-atypicality

Repository for the NeurIPS 2023 paper "Beyond Confidence: Reliable Models Should Also Consider Atypicality"
https://github.com/mertyg/beyond-confidence-atypicality

calibration conformal-prediction trustworthy-machine-learning uncertainty-quantification

Last synced: 5 months ago
JSON representation

Repository for the NeurIPS 2023 paper "Beyond Confidence: Reliable Models Should Also Consider Atypicality"

Host: GitHub
URL: https://github.com/mertyg/beyond-confidence-atypicality
Owner: mertyg
License: mit
Created: 2023-05-29T17:08:03.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-04-21T19:53:05.000Z (over 1 year ago)
Last Synced: 2025-04-03T15:52:38.046Z (7 months ago)
Topics: calibration, conformal-prediction, trustworthy-machine-learning, uncertainty-quantification
Language: Python
Homepage:
Size: 26.4 KB
Stars: 12
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Beyond Confidence: Reliable Models Should Also Consider Atypicality (NeurIPS 2023)

This is the repository for the paper [**Beyond Confidence: Reliable Models Should Also Consider Atypicality (NeurIPS 2023)**](https://arxiv.org/abs/2305.18262). An earlier version of the paper appeared as a Contributed Talk at ICLR 2023 Workshop on Trustworthy Machine Learning.

Overall, we demonstrate that machine learning models should also consider atypicality (i.e. how 'rare' a sample is) when making predictions. We show simple- and easy-to-implement atypicality estimators can provide significant value.

# Atypicality Estimation

## LLM CLassification

For large language models, the log-likelihood provided by the model already estimates how `typical` a prompt is. Therefore, we simply use

```python

from model_zoo import get_alpaca7b

# Assume we have the path to Alpaca7B under `model_path`

model = get_alpaca7b(model_path=model_path)

test_atypicality = -model.get_batch_loglikelihood(test_prompts)

```

We will later use the atypicality value in the recalibration algorithm to boost accuracy and calibration. See `model_zoo/llm.py` for further details.

**Obtaining Alpaca7b:** Please refer to `https://github.com/tatsu-lab/stanford_alpaca` for instructions on how to obtain Alpaca7b.

## Image Classification

For image classifiers, there is no out-of-the-box atypicality estimator. Therefore, we use the following simple estimator:

```python

from atypicality import GMMAtypicalityEstimator

from model_zoo import get_image_classifier

from data_zoo import get_image_dataset

# First, load the dataset and model of interest.

model, preprocess = get_image_classifier("resnext50_imagenet_lt", device="cuda")

train_dataset, val_dataset, test_dataset = get_image_dataset("imagenet_lt", preprocess=preprocess)

# We first extract the features for the training and test sets:

train_features, train_logits, train_labels = model.run_and_cache_outputs(train_dataset, batch_size=16, output_dir="./outputs/")

calib_features, calib_logits, calib_labels = model.run_and_cache_outputs(val_dataset, batch_size=16, output_dir="./outputs/")

test_features, test_logits, test_labels = model.run_and_cache_outputs(test_dataset, batch_size=16, output_dir="./outputs/")

# Then, we fit the estimator to the embeddings of the training set:

atypicality_estimator = GMMAtypicalityEstimator()

atypicality_estimator.fit(train_features, train_labels)

# We then use the estimator to estimate atypicality of the calibration and test sets:

calib_atypicality = atypicality_estimator.predict(calib_features)

test_atypicality = atypicality_estimator.predict(test_features)

```

# Improving Calibration (and Performance) with Atypicality-Aware Recalibration (AAR)

Once the atypicality of different inputs are computed, we can perform atypicality-aware recalibration with a few simple lines:

```python

from calibration import AtypicalityAwareCalibrator

aar_calib = AtypicalityAwareCalibrator()

# Train the AAR calibrator: 

aar_calib.fit(calib_logits, calib_atypicality, calib_labels, max_iters=1500)

# Use the AAR calibrator:

test_pred_probs = aar_calib.predict_proba(test_logits, test_atypicality)

```

# Reproducing Results

To reproduce our LLM classification experiments, please refer `LLM Classification.ipynb`. 

Similarly, to reproduce our image classification experiments, please refer to `Imbalanced Image Classification.ipynb` and `Balanced Image Classification.ipynb`. 

In each of the above notebooks, we demonstrate how atypicality-awareness improves the accuracy and the calibration of the models.

# Citation

If you find this repository or the ideas therein useful, please consider citing our paper:

```

@article{yuksekgonul2023beyond,

  title={Beyond Confidence: Reliable Models Should Also Consider Atypicality},

  author={Yuksekgonul, Mert and Zhang, Linjun and Zou, James and Guestrin, Carlos},

  journal={arXiv preprint arXiv:2305.18262},

  year={2023}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mertyg/beyond-confidence-atypicality

Awesome Lists containing this project

README