Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vincentblot28/conformalized_gp

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/vincentblot28/conformalized_gp
Owner: vincentblot28
Created: 2023-11-13T08:37:34.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-07-10T15:50:41.000Z (7 months ago)
Last Synced: 2024-12-10T05:42:09.574Z (about 2 months ago)
Language: Jupyter Notebook
Size: 20.7 MB
Stars: 46
Watchers: 2
Forks: 9
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        Conformal Approach To Gaussian Process Surrogate Evaluation With Coverage Guarantees

====================================================

**What is done in this repo** 

🔗 Requirements

===============

Python 3.7+ 

[OpenTURNS](https://openturns.github.io/www/index.html) is a C++ library made, hence one can need to install gcc to be able to run the library

**Ubuntu**:

```

$ sudo apt update

$ sudo apt install build-essential

```

**OSX**: 

```

$ brew install gcc

```

**Windows**: Install MinGW (a Windows distribution of gcc) or Microsoft’s Visual C

Install the required packages:

- Via `pip`:

```

$ pip install -r requirements.txt

```

- Via conda:

```

$ conda install -f environment.yml

```

🛠 Installation

===============

Clone the repo and run the following command in the conformalized_gp directory to install the code

```

$ pip install .

```

⚡️ Quickstart

==============

Here is a @quickstart to use the Jackknife+GP method on any regression dataset. Here, the goal is the compare

visually the results given by the standard Jackknife+ method, the Credibility Intervals and our methodology.

The notebook from which this quickstart is inspired can be found [here](https://github.com/vincentblot28/conformalized_gp/blob/main/notebook/conformalized_gp_quickstart.ipynb)

We first start to import the necessary packages

```python

import matplotlib.pyplot as plt

import numpy as np

import scipy

from sklearn.gaussian_process import GaussianProcessRegressor

from sklearn.model_selection import train_test_split

from mapie.conformity_scores.residual_conformity_scores import GPConformityScore

from mapie.regression import MapieRegressor

BLUE = np.array([[26, 54, 105]]) / 255

ORANGE = np.array([[223, 84, 49]]) / 255

YELLOW = np.array([[242, 188, 64]]) / 255

```

- In this example, we are going to work on an analytical function of our imagination which have some good visual behavior :

$$g(x) = 3x\sin(x) - 2x\cos(x) + \frac{x^3}{40} - \frac{x^2}{2} - 10x$$

```python

def g(x):

    return (3 * x * np.sin(x) - 2 * x * np.cos(x) + ( x ** 3) / 40 - .5 * x ** 2 - 10 * x)

x_mesh = np.linspace(-40, 60, 5000)

plt.plot(x_mesh, g(x_mesh))

plt.xlabel("$x$")

plt.ylabel("$g(x)$")

```

![toy function](https://github.com/vincentblot28/conformalized_gp/blob/main/plots/toy_function.png)

- Then we split our data into train and test and train au sickit-learn `GaussianProcessRegressor` with a `RBF` kernel.

```python 

X_train, X_test, y_train, y_test = train_test_split(x_mesh, g(x_mesh), test_size=.98, random_state=42)

X_train = X_train.reshape(-1, 1)

X_test = X_test.reshape(-1, 1)

gp = GaussianProcessRegressor(normalize_y=True)

gp.fit(X_train, y_train)

```

- We then define and train the two conformal methods (J+ and J+GP):

```python 

mapie_j_plus_gp = MapieRegressor(

    estimator=gp,

    cv=-1,

    method="plus",

    conformity_score=GPConformityScore(),

    model_has_std=True,

    random_state=42

)

mapie_j_plus = MapieRegressor(

    estimator=gp,

    cv=-1,

    method="plus",

    conformity_score=None,

    model_has_std=False,

    random_state=42

)

mapie_j_plus_gp.fit(X_train, y_train)

mapie_j_plus.fit(X_train, y_train)

```

- Finally,  we predict and compute prediction intervals with a confidence level of 90% on the test set and plot the prediction intervals of the three methods

```python

ALPHA = .1

_, y_pss_j_plus_gp = mapie_j_plus_gp.predict(x_mesh.reshape(-1, 1), alpha=ALPHA)

_, y_pss_j_plus = mapie_j_plus.predict(x_mesh.reshape(-1, 1), alpha=ALPHA)

y_mean, y_std = gp.predict(x_mesh.reshape(-1, 1), return_std=True)

q_alpha_min = scipy.stats.norm.ppf(ALPHA / 2)

q_alpha_max = scipy.stats.norm.ppf(1 - ALPHA / 2)

f, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.scatter(X_train, y_train, c=BLUE)

ax.plot(x_mesh, g(x_mesh), c=BLUE)

ax.plot(x_mesh, y_mean, c=YELLOW)

ax.fill_between(

        x_mesh,

        y_mean + y_std * q_alpha_min,

        y_mean + y_std * q_alpha_max,

        alpha=0.3,

        color=YELLOW,

        label=r"$\pm$ 1 std. dev.",

    )

ax.fill_between(

        x_mesh,

        y_pss_j_plus_gp[:, 0, 0],

        y_pss_j_plus_gp[:, 1, 0],

        alpha=.6,

        color=ORANGE,

        label=r"$\pm$ 1 std. dev.",

    )

ax.fill_between(

        x_mesh,

        y_pss_j_plus[:, 0, 0],

        y_pss_j_plus[:, 1, 0],

        alpha=.3,

        color="g",

        label=r"$\pm$ 1 std. dev.",

    )

ax.legend(

    [

        "Training Points",

        "True function", "Mean of posterior GP",

        "Posterior GP Credibility Interval",

        "Prediction Interval J+GP",

         "Prediction Interval J+", 

    ]

)

ax.set_xlabel("$x$")

ax.set_ylabel("$g(x)$")

```

![toy function intervals](https://github.com/vincentblot28/conformalized_gp/blob/main/plots/intervals_toy_function.png)

🔌 Plug OpenTURNS GP into MAPIE

===========================

If you wish to use our code with an OpenTURNS model, we have implemented a simple wrapper around the model so that it

can be used very easily:

```python

from wrappers import GpOTtoSklearnStd

nu = 5/2  # Hyperparameter of the Matérn Kernel

noise = None  # Standard deviation of the nugget effect. If None, no nugget effect is applied.

gp_estimator = GpOTtoSklearnStd(scale=1, amplitude=1, nu=nu, noise=None)

```

This estimator is now fully compatible with MAPIE as it comes with it `.fit` and `.predict` methods.