Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vincentblot28/conformalized_gp
https://github.com/vincentblot28/conformalized_gp
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/vincentblot28/conformalized_gp
- Owner: vincentblot28
- Created: 2023-11-13T08:37:34.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-10T15:50:41.000Z (7 months ago)
- Last Synced: 2024-12-10T05:42:09.574Z (about 2 months ago)
- Language: Jupyter Notebook
- Size: 20.7 MB
- Stars: 46
- Watchers: 2
- Forks: 9
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Conformal Approach To Gaussian Process Surrogate Evaluation With Coverage Guarantees
====================================================**What is done in this repo**
đź”— Requirements
===============
Python 3.7+[OpenTURNS](https://openturns.github.io/www/index.html) is a C++ library made, hence one can need to install gcc to be able to run the library
**Ubuntu**:
```
$ sudo apt update
$ sudo apt install build-essential
```**OSX**:
```
$ brew install gcc
```**Windows**: Install MinGW (a Windows distribution of gcc) or Microsoft’s Visual C
Install the required packages:
- Via `pip`:```
$ pip install -r requirements.txt
```- Via conda:
```
$ conda install -f environment.yml
```đź› Installation
===============Clone the repo and run the following command in the conformalized_gp directory to install the code
```
$ pip install .
```⚡️ Quickstart
==============
Here is a @quickstart to use the Jackknife+GP method on any regression dataset. Here, the goal is the compare
visually the results given by the standard Jackknife+ method, the Credibility Intervals and our methodology.
The notebook from which this quickstart is inspired can be found [here](https://github.com/vincentblot28/conformalized_gp/blob/main/notebook/conformalized_gp_quickstart.ipynb)We first start to import the necessary packages
```python
import matplotlib.pyplot as plt
import numpy as np
import scipy
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.model_selection import train_test_splitfrom mapie.conformity_scores.residual_conformity_scores import GPConformityScore
from mapie.regression import MapieRegressorBLUE = np.array([[26, 54, 105]]) / 255
ORANGE = np.array([[223, 84, 49]]) / 255
YELLOW = np.array([[242, 188, 64]]) / 255
```- In this example, we are going to work on an analytical function of our imagination which have some good visual behavior :
$$g(x) = 3x\sin(x) - 2x\cos(x) + \frac{x^3}{40} - \frac{x^2}{2} - 10x$$
```python
def g(x):
return (3 * x * np.sin(x) - 2 * x * np.cos(x) + ( x ** 3) / 40 - .5 * x ** 2 - 10 * x)x_mesh = np.linspace(-40, 60, 5000)
plt.plot(x_mesh, g(x_mesh))
plt.xlabel("$x$")
plt.ylabel("$g(x)$")
```
![toy function](https://github.com/vincentblot28/conformalized_gp/blob/main/plots/toy_function.png)- Then we split our data into train and test and train au sickit-learn `GaussianProcessRegressor` with a `RBF` kernel.
```python
X_train, X_test, y_train, y_test = train_test_split(x_mesh, g(x_mesh), test_size=.98, random_state=42)
X_train = X_train.reshape(-1, 1)
X_test = X_test.reshape(-1, 1)
gp = GaussianProcessRegressor(normalize_y=True)
gp.fit(X_train, y_train)
```- We then define and train the two conformal methods (J+ and J+GP):
```python
mapie_j_plus_gp = MapieRegressor(
estimator=gp,
cv=-1,
method="plus",
conformity_score=GPConformityScore(),
model_has_std=True,
random_state=42
)mapie_j_plus = MapieRegressor(
estimator=gp,
cv=-1,
method="plus",
conformity_score=None,
model_has_std=False,
random_state=42
)mapie_j_plus_gp.fit(X_train, y_train)
mapie_j_plus.fit(X_train, y_train)
```- Finally, we predict and compute prediction intervals with a confidence level of 90% on the test set and plot the prediction intervals of the three methods
```python
ALPHA = .1_, y_pss_j_plus_gp = mapie_j_plus_gp.predict(x_mesh.reshape(-1, 1), alpha=ALPHA)
_, y_pss_j_plus = mapie_j_plus.predict(x_mesh.reshape(-1, 1), alpha=ALPHA)y_mean, y_std = gp.predict(x_mesh.reshape(-1, 1), return_std=True)
q_alpha_min = scipy.stats.norm.ppf(ALPHA / 2)
q_alpha_max = scipy.stats.norm.ppf(1 - ALPHA / 2)f, ax = plt.subplots(1, 1, figsize=(20, 10))
ax.scatter(X_train, y_train, c=BLUE)ax.plot(x_mesh, g(x_mesh), c=BLUE)
ax.plot(x_mesh, y_mean, c=YELLOW)
ax.fill_between(
x_mesh,
y_mean + y_std * q_alpha_min,
y_mean + y_std * q_alpha_max,
alpha=0.3,
color=YELLOW,
label=r"$\pm$ 1 std. dev.",
)ax.fill_between(
x_mesh,
y_pss_j_plus_gp[:, 0, 0],
y_pss_j_plus_gp[:, 1, 0],
alpha=.6,
color=ORANGE,
label=r"$\pm$ 1 std. dev.",
)ax.fill_between(
x_mesh,
y_pss_j_plus[:, 0, 0],
y_pss_j_plus[:, 1, 0],
alpha=.3,
color="g",
label=r"$\pm$ 1 std. dev.",
)
ax.legend(
[
"Training Points",
"True function", "Mean of posterior GP",
"Posterior GP Credibility Interval",
"Prediction Interval J+GP",
"Prediction Interval J+",
]
)
ax.set_xlabel("$x$")
ax.set_ylabel("$g(x)$")
```
![toy function intervals](https://github.com/vincentblot28/conformalized_gp/blob/main/plots/intervals_toy_function.png)🔌 Plug OpenTURNS GP into MAPIE
===========================If you wish to use our code with an OpenTURNS model, we have implemented a simple wrapper around the model so that it
can be used very easily:```python
from wrappers import GpOTtoSklearnStdnu = 5/2 # Hyperparameter of the Matérn Kernel
noise = None # Standard deviation of the nugget effect. If None, no nugget effect is applied.
gp_estimator = GpOTtoSklearnStd(scale=1, amplitude=1, nu=nu, noise=None)
```This estimator is now fully compatible with MAPIE as it comes with it `.fit` and `.predict` methods.