Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/donlnz/nonconformist

Python implementation of the conformal prediction framework.
https://github.com/donlnz/nonconformist

Last synced: about 2 months ago
JSON representation

Python implementation of the conformal prediction framework.

Awesome Lists containing this project

README

        

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Conformal prediction\n",
"\n",
"Conformal predictors are predictive models that associate each of their predictions with a measure of statistically valid confidence. Given a test object $x_i$ and a user-specified significance level $\\epsilon \\in (0, 1)$, a conformal predictor outputs a prediction region $\\Gamma_i^{\\epsilon} \\subseteq Y$ that contains the true output value $y_i \\in Y$ with probability $1-\\epsilon$.\n",
"\n",
"# Suggested reading\n",
"\n",
"TODO"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Nonconformist Usage:\n",
"## 1. Basics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example 1a: Simple ICP (classification)\n",
"In this example, we construct a simple inductive conformal predictor for classification, using a support vector classifier as the underlying model."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ True False False]\n",
" [ True False False]\n",
" [ True False False]\n",
" [False True False]\n",
" [False True False]]\n"
]
}
],
"source": [
"from sklearn.datasets import load_iris\n",
"import numpy as np\n",
"from sklearn.svm import SVC\n",
"from nonconformist.cp import IcpClassifier\n",
"from nonconformist.nc import NcFactory\n",
" \n",
"iris = load_iris()\n",
"idx = np.random.permutation(iris.target.size)\n",
"\n",
"# Divide the data into proper training set, calibration set and test set\n",
"idx_train, idx_cal, idx_test = idx[:50], idx[50:100], idx[100:]\n",
"\n",
"model = SVC(probability=True)\t# Create the underlying model\n",
"nc = NcFactory.create_nc(model)\t# Create a default nonconformity function\n",
"icp = IcpClassifier(nc)\t\t\t# Create an inductive conformal classifier\n",
"\n",
"# Fit the ICP using the proper training set\n",
"icp.fit(iris.data[idx_train, :], iris.target[idx_train])\n",
"\n",
"# Calibrate the ICP using the calibration set\n",
"icp.calibrate(iris.data[idx_cal, :], iris.target[idx_cal])\n",
"\n",
"# Produce predictions for the test set, with confidence 95%\n",
"prediction = icp.predict(iris.data[idx_test, :], significance=0.05)\n",
"\n",
"# Print the first 5 predictions\n",
"print(prediction[:5, :])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is a boolean numpy.array with shape (n_test, n_classes), where each row is a boolean vector denoting the class labels included in the prediction region at the specified significance level.\n",
"\n",
"For this particular example, we might obtain, for a given test object, a boolean vector [ True True False ], meaning that the $1-\\epsilon$ confidence prediction region contains class labels 0 and 1 (i.e., with 95% probability, one of these two classes will be correct)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example 1b: Simple TCP (classification)\n",
"In this example, we construct a simple transductive conformal predictor for classification, using a support vector classifier as the underlying model."
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[False True False]\n",
" [ True False False]\n",
" [False True False]\n",
" [ True False False]\n",
" [False False True]]\n"
]
}
],
"source": [
"from sklearn.datasets import load_iris\n",
"import numpy as np\n",
"from sklearn.svm import SVC\n",
"from nonconformist.cp import TcpClassifier\n",
"from nonconformist.nc import NcFactory\n",
" \n",
"iris = load_iris()\n",
"idx = np.random.permutation(iris.target.size)\n",
"\n",
"# Divide the data into training set and test set\n",
"idx_train, idx_test = idx[:100], idx[100:]\n",
"\n",
"model = SVC(probability=True)\t# Create the underlying model\n",
"nc = NcFactory.create_nc(model)\t# Create a default nonconformity function\n",
"tcp = TcpClassifier(nc)\t\t\t# Create a transductive conformal classifier\n",
"\n",
"# Fit the TCP using the proper training set\n",
"tcp.fit(iris.data[idx_train, :], iris.target[idx_train])\n",
"\n",
"# Produce predictions for the test set, with confidence 95%\n",
"prediction = tcp.predict(iris.data[idx_test, :], significance=0.05)\n",
"\n",
"# Print the first 5 predictions\n",
"print(prediction[:5, :])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We obtain a result that is conceptually identical as in the previous example (although the particular output values will differ)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example 1c: Simple ICP (regression)\n",
"\n",
"In this example, we construct a simple inductive conformal predictor for regression, this time using a random forest regression model as the underlying model."
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 26.47 39.45]\n",
" [ 17.1 30.08]\n",
" [ 15.79 28.77]\n",
" [ 17.03 30.01]\n",
" [ 7.08 20.06]]\n"
]
}
],
"source": [
"from sklearn.datasets import load_boston\n",
"import numpy as np\n",
"from sklearn.ensemble import RandomForestRegressor\n",
"from nonconformist.cp import IcpRegressor\n",
"from nonconformist.nc import NcFactory\n",
" \n",
"boston = load_boston()\n",
"idx = np.random.permutation(boston.target.size)\n",
"\n",
"# Divide the data into proper training set, calibration set and test set\n",
"idx_train, idx_cal, idx_test = idx[:300], idx[300:399], idx[399:]\n",
"\n",
"model = RandomForestRegressor()\t# Create the underlying model\n",
"nc = NcFactory.create_nc(model)\t# Create a default nonconformity function\n",
"icp = IcpRegressor(nc)\t\t\t# Create an inductive conformal regressor\n",
"\n",
"# Fit the ICP using the proper training set\n",
"icp.fit(boston.data[idx_train, :], boston.target[idx_train])\n",
"\n",
"# Calibrate the ICP using the calibration set\n",
"icp.calibrate(boston.data[idx_cal, :], boston.target[idx_cal])\n",
"\n",
"# Produce predictions for the test set, with confidence 95%\n",
"prediction = icp.predict(boston.data[idx_test, :], significance=0.05)\n",
"\n",
"# Print the first 5 predictions\n",
"print(prediction[:5, :])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This time the result is a numerical numpy.array with shape (n_test, 2), where each row is a vector signifying the lower and upper bounds of an interval, denoting the prediction region at the specified significance level.\n",
"\n",
"For this particular example, we might obtain, for a given test object, a numerical vector [ 8.8 21.6 ], meaning that the $1-\\epsilon$ confidence prediction region is the interval $[8.8, 21.6]$ (i.e., with 95% probability, the correct output value lies somehwere on this interval)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Nonconformity functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example 2a: Choosing your underlying model\n",
"\n",
"The simplest way of defining a nonconformity function based on a classification or regression algorithm, is to simply import the algorithm you want to use from sklearn, and create a nonconformity function using nonconformist's NcFactory.\n",
"\n",
"Note that NcFactory works only with learning algorithms (classifiers and regressors) implemented in scikit-learn. If you would like to use other kinds of underlying models (e.g., from TensorFlow or other libraries), please refer to later sections of this document."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Classification"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from nonconformist.nc import NcFactory\n",
"\n",
"nc_dt = NcFactory.create_nc(DecisionTreeClassifier(min_samples_leaf=5))\n",
"nc_rf = NcFactory.create_nc(RandomForestClassifier(n_estimators=500))\n",
"nc_knn = NcFactory.create_nc(KNeighborsClassifier(n_neighbors=11))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Regression"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.tree import DecisionTreeRegressor\n",
"from sklearn.ensemble import RandomForestRegressor\n",
"from sklearn.neighbors import KNeighborsRegressor\n",
"\n",
"nc_dt = NcFactory.create_nc(DecisionTreeRegressor(min_samples_leaf=5))\n",
"nc_rf = NcFactory.create_nc(RandomForestRegressor(n_estimators=500))\n",
"nc_knn = NcFactory.create_nc(KNeighborsRegressor(n_neighbors=11))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example 2b: Choosing your error function\n",
"\n",
"Nonconformist has built-in support for the most common model-agnostic nonconformity functions for both classification and regression, including:\n",
"\n",
"#### Classification\n",
"##### Inverse probability (```nonconformist.nc.InverseProbabilityErrFunc```)\n",
"$\\alpha_i = 1 - \\hat{P}\\left(y_i \\mid x_i\\right)$\n",
"\n",
"##### Margin (```nonconformist.nc.MarginErrFunc```)\n",
"$\\alpha_i = 0.5 - \\dfrac{\\hat{P}\\left(y_i \\mid x_i\\right) - max_{y \\, \\neq \\, y_i} \\hat{P}\\left(y \\mid x_i\\right)}{2}$\n",
"\n",
"#### Regression\n",
"##### Absolute error (```nonconformist.nc.AbsErrorErrFunc```)\n",
"$\\alpha_i = \\left| y_i - \\hat{y}_i \\right| $\n",
"\n",
"##### Signed error (```nonconformist.nc.SignErrorErrFunc```)\n",
"$\\alpha_i = y_i - \\hat{y}_i$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Classification"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.neighbors import KNeighborsClassifier\n",
"from nonconformist.nc import NcFactory, InverseProbabilityErrFunc, MarginErrFunc\n",
"\n",
"nc_proba = NcFactory.create_nc(\n",
" KNeighborsClassifier(n_neighbors=11),\n",
" InverseProbabilityErrFunc()\n",
")\n",
"\n",
"nc_margin = NcFactory.create_nc(\n",
" KNeighborsClassifier(n_neighbors=11),\n",
" MarginErrFunc()\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Regression"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.neighbors import KNeighborsRegressor\n",
"from nonconformist.nc import NcFactory, AbsErrorErrFunc, SignErrorErrFunc\n",
"\n",
"nc_abs = NcFactory.create_nc(\n",
" KNeighborsRegressor(n_neighbors=11),\n",
" AbsErrorErrFunc()\n",
")\n",
"\n",
"nc_sign = NcFactory.create_nc(\n",
" KNeighborsRegressor(n_neighbors=11),\n",
" SignErrorErrFunc()\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example 2c: Normalization\n",
"Nonconformist support normalized nonconformity functions, i.e., nonconformity functions that leverage an additional underlying model that attempts to predict the difficulty of predicting the output of a given test pattern. This is typically used in the context of regression (in order to obtain prediction intervals whose sizes vary depending on the estimated difficulty of the test pattern), but nonconformist also supports normalized nonconformity functions for classification.\n",
"\n",
"Note that the normalization model should always be a regression model (also for classification problems)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Classification"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.neighbors import KNeighborsRegressor\n",
"from nonconformist.nc import NcFactory\n",
"\n",
"nc = NcFactory.create_nc(\n",
" RandomForestClassifier(n_estimators=500),\n",
" normalizer_model=KNeighborsRegressor(n_neighbors=11)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Regression"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.ensemble import RandomForestRegressor\n",
"from sklearn.neighbors import KNeighborsRegressor\n",
"from nonconformist.nc import NcFactory\n",
"\n",
"nc = NcFactory.create_nc(\n",
" RandomForestRegressor(n_estimators=500),\n",
" normalizer_model=KNeighborsRegressor(n_neighbors=11)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 2d: Creating your own error function\n",
"Nonconformist aims to be highly extendable, primarily in terms of defining new types of nonconformity functions. The simplest way of defining a new nonconformity function is to introduce a new error function."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Classification\n",
"Here, we introduce a nonconformity function for classification, where\n",
"\n",
"$\\alpha_i = \\sum_{y \\, \\neq \\, y_i} \\hat{P}\\left(y \\mid x_i \\right) ^ 2$"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.svm import SVC\n",
"from nonconformist.nc import NcFactory, ClassificationErrFunc\n",
"\n",
"class MyClassErrFunc(ClassificationErrFunc):\n",
" def __init__(self):\n",
" super(MyClassErrFunc, self).__init__()\n",
" \n",
" def apply(self, prediction, y):\n",
" '''\n",
" y is a vector of labels\n",
" prediction is a matrix of class probability estimates\n",
" '''\n",
" prob = np.zeros(y.size, dtype=np.float32)\n",
" for i, y_ in enumerate(y):\n",
" if y_ >= prediction.shape[1]:\n",
" prob[i] = 0\n",
" else:\n",
" prob[i] = (prediction[i, :] ** 2).sum()\n",
" prob[i] -= prediction[i, int(y_)] ** 2\n",
" return prob\n",
" \n",
"model = SVC(probability=True)\n",
"nc = NcFactory.create_nc(model, MyClassErrFunc())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Regression\n",
"\n",
"Here, we introduce a nonconformity for regression, where\n",
"\n",
"$\\alpha_i = ln \\left(\\left| y_i - \\hat{y}_i \\right|\\right)$\n",
"\n",
"and \n",
"\n",
"$\\Gamma_i^{\\epsilon} = \\hat{y}_i \\pm e^{\\alpha_{\\epsilon}}$\n",
"\n",
"(This example is provided only for illustration, as it is practically identical to simply taking the absolute error as nonconformity.)"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.svm import SVR\n",
"from nonconformist.nc import NcFactory, RegressionErrFunc\n",
"\n",
"class MyRegErrFunc(RegressionErrFunc):\n",
" def __init__(self):\n",
" super(MyRegErrFunc, self).__init__()\n",
" \n",
" def apply(self, prediction, y):\n",
" '''\n",
" y is a vector of (true) labels\n",
" prediction is a vector of (predicted) labels\n",
" '''\n",
" return np.log(np.abs(prediction - y))\n",
"\n",
" def apply_inverse(self, nc, significance):\n",
" '''\n",
" apply_inverse() is the (partial) inverse of apply()\n",
" apply_inverse(...)[0] is subtracted from the prediction of the\n",
" underlying model to create the lower boundary of the\n",
" prediction interval\n",
" apply_inverse(...)[1] is added to the prediction of the\n",
" underlying model to create the upper boundary of the\n",
" prediction interval\n",
" '''\n",
" nc = np.sort(nc)[::-1]\n",
" border = int(np.floor(significance * (nc.size + 1))) - 1\n",
" border = min(max(border, 0), nc.size - 1)\n",
" return np.vstack([np.exp(nc[border]), np.exp(nc[border])])\n",
"\n",
" \n",
"model = SVR()\n",
"nc = NcFactory.create_nc(model, MyRegErrFunc())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 2e: Using other types of underlying models\n",
"It is possible to use Nonconformist together with machine learning implementations other than those available in scikit-learn, however, it is then necessary to construct a new adapter class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Classification"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [],
"source": [
"from nonconformist.base import ClassifierAdapter\n",
"from nonconformist.nc import ClassifierNc\n",
"\n",
"class MyClassifierAdapter(ClassifierAdapter):\n",
" def __init__(self, model, fit_params=None):\n",
" super(MyClassifierAdapter, self).__init__(model, fit_params)\n",
" \n",
" def fit(self, x, y):\n",
" '''\n",
" x is a numpy.array of shape (n_train, n_features)\n",
" y is a numpy.array of shape (n_train)\n",
" \n",
" Here, do what is necessary to train the underlying model\n",
" using the supplied training data\n",
" '''\n",
" pass\n",
" \n",
" def predict(self, x):\n",
" '''\n",
" Obtain predictions from the underlying model\n",
" \n",
" Make sure this function returns an output that is compatible with\n",
" the nonconformity function used. For default nonconformity functions,\n",
" output from this function should be class probability estimates in\n",
" a numpy.array of shape (n_test, n_classes)\n",
" '''\n",
" pass\n",
" \n",
"my_classifier = None # Initialize an object of your classifier's type\n",
"model = MyClassifierAdapter(my_classifier)\n",
"nc = ClassifierNc(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Regression"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [],
"source": [
"from nonconformist.base import RegressorAdapter\n",
"from nonconformist.nc import RegressorNc\n",
"\n",
"class MyRegressorAdapter(RegressorAdapter):\n",
" def __init__(self, model, fit_params=None):\n",
" super(MyRegressorAdapter, self).__init__(model, fit_params)\n",
" \n",
" def fit(self, x, y):\n",
" '''\n",
" x is a numpy.array of shape (n_train, n_features)\n",
" y is a numpy.array of shape (n_train)\n",
" \n",
" Here, do what is necessary to train the underlying model\n",
" using the supplied training data\n",
" '''\n",
" pass\n",
" \n",
" def predict(self, x):\n",
" '''\n",
" Obtain predictions from the underlying model\n",
" \n",
" Make sure this function returns an output that is compatible with\n",
" the nonconformity function used. For default nonconformity functions,\n",
" output from this function should be predicted real values in a\n",
" numpy.array of shape (n_test)\n",
" '''\n",
" pass\n",
"\n",
"my_regressor = None # Initialize an object of your regressor's type\n",
"model = MyRegressorAdapter(my_regressor)\n",
"nc = RegressorNc(model)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}