https://github.com/rnburn/bbai
Deterministic algorithms for objective Bayesian inference and hyperparameter optimization
https://github.com/rnburn/bbai
bayesian-statistics gaussian-processes hyperparameter-optimization machine-learning python regression-models statistics
Last synced: 4 days ago
JSON representation
Deterministic algorithms for objective Bayesian inference and hyperparameter optimization
- Host: GitHub
- URL: https://github.com/rnburn/bbai
- Owner: rnburn
- License: cc-by-4.0
- Created: 2020-04-26T06:21:36.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2025-05-09T19:56:16.000Z (8 days ago)
- Last Synced: 2025-05-09T20:46:25.729Z (8 days ago)
- Topics: bayesian-statistics, gaussian-processes, hyperparameter-optimization, machine-learning, python, regression-models, statistics
- Language: Python
- Homepage: https://buildingblock.ai/
- Size: 1.13 GB
- Stars: 62
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## bbai
 [](https://badge.fury.io/py/bbai) [](https://creativecommons.org/licenses/by/4.0/) [](https://buildingblock.ai/bbai.glm)Deterministic, exact algorithms for objective Bayesian inference and hyperparameter optimization.
## Installation
**bbai** supports both Linux and OSX on x86-64.
```
pip install bbai
```## Usage
### Fully Bayesian Single-variable Logistic Regression with Reference Prior
https://www.objectivebayesian.com/p/election-2024Build a fully Bayesian logistic regression model with a single unknown weight using Jeffreys
prior (or reference prior, which are the same for only a single parameter).```python
from bbai.glm import BayesianLogisticRegression1x = [-5, 2, 8, 1]
y = [0, 1, 0, 1]
model = BayesianLogisticRegression1()# Fit a posterior distribution for w with the logistic
# regression reference prior
model.fit(x, y)# Print the posterior probability that w < 0.123
print(model.cdf(0.123))
```### Hypothesis testing using Expected Encompassing Intrinsic Bayes Factors (EEIBF)
https://www.objectivebayesian.com/p/hypothesis-testingThe EEIBF method is described in the paper *Default Bayes Factors for Nonnested Hypothesis Testing*
by James Berger and Julia Mortera ([postscript](http://www2.stat.duke.edu/~berger/papers/mortera.ps)).The python code below shows how to test these three hypotheses for the mean of normally distributed
data with unknown variance.
```
H_equal: mean = 0
H_left: mean < 0
H_right: mean > 0
```
```python
from bbai.stat import NormalMeanHypothesis
import numpy as npnp.random.seed(0)
data = np.random.normal(0.123, 1.5, size=9)
probs = NormalMeanHypothesis().test(data)
print(probs.equal) # posterior probability for H_equal 0.235
print(probs.left) # posterior probability for H_left 0.0512
print(probs.right) # posterior probability for H_right 0.713
```
See [example/19-hypothesis-first-t.ipynb](example/19-hypothesis-first-t.ipynb) for an example and
[example/18-hypothesis-eeibf-validation.ipynb](example/18-hypothesis-eeibf-validation.ipynb) for a
step-by-step validation of the method against the paper.### Objective Bayesian inference for comparing binomial proportions
https://www.objectivebayesian.com/p/binomial-comparisonFit a posterior distribution with a reference prior to compare binomial proportions:
```python
from bbai.model import DeltaBinomialModel# Some example data
a1, b1, a2, b2 = 5, 3, 2, 7# Fit a posterior distribution with likelihood function
# L(theta, x) = (theta + x)^a1 * (1 - theta - x)^b1 * x^a2 (1-x)^b2
# where theta represents the difference of the two binomial distribution probabilities
model = DeltaBinomialModel(prior='reference')
model.fit(a1, b1, a2, b2)# Print the probability that theta < 0.123
print(model.cdf(0.123))
# Prints 0.10907436812863071
```### Efficient approximation of multivariable functions using adaptive sparse grids at Chebyshev nodes.
```python
from bbai.numeric import SparseGridInterpolator
import numpy as np# A test function
def f(x, y, z):
t1 = 0.68 * np.abs(x - 0.3)
t2 = 1.25 * np.abs(y - 0.15)
t3 = 1.86 * np.abs(z - 0.09)
return np.exp(-t1 - t2 - t3)# Fit a sparse grid to approximate f
ranges = [(-2, 5), (1, 3), (-2, 2)]
interp = SparseGridInterpolator(tolerance=1.0e-4, ranges=ranges)
interp.fit(f)
print('num_pts =', interp.points.shape[1])
# prints 10851# Test the accuracy at a random point of the domain
print(interp.evaluate(1.84, 2.43, 0.41), f(1.84, 2.43, 0.41))
# prints 0.011190847391188667 0.011193746554063376# Integrate the approximation over the range
print(interp.integral)
# prints 0.6847335267327939
```
### Objective Bayesian Inference for Gaussian Process Models
Construct prediction distributions for Gaussian process models using full integration over the
parameter space with a noninformative, reference prior.
```python
import numpy as np
from bbai.gp import BayesianGaussianProcessRegression, RbfCovarianceFunction# Make an example data set
def make_location_matrix(N):
res = np.zeros((N, 1))
step = 1.0 / (N - 1)
for i in range(N):
res[i, 0] = i * step
return res
def make_covariance_matrix(S, sigma2, theta, eta):
N = len(S)
res = np.zeros((N, N))
for i in range(N):
si = S[i]
for j in range(N):
sj = S[j]
d = np.linalg.norm(si - sj)
res[i, j] = np.exp(-0.5*(d/theta)**2)
res[i, i] += eta
return sigma2 * res
def make_target_vector(K):
return np.random.multivariate_normal(np.zeros(K.shape[0]), K)
np.random.seed(0)
N = 20
sigma2 = 25
theta = 0.01
eta = 0.1
params = (sigma2, theta, eta)
S = make_location_matrix(N)
K = make_covariance_matrix(S, sigma2, theta, eta)
y = make_target_vector(K)# Fit a Gaussian process model to the data
model = BayesianGaussianProcessRegression(kernel=RbfCovarianceFunction())
model.fit(S, y)# Construct the prediction distribution for x=0.1
preds, pred_pdfs = model.predict([[0.1]], with_pdf=True)
high, low = pred_pdfs.ppf(0.75), pred_pdfs.ppf(0.25)# Print the mean and %25-%75 credible set of the prediction distribution
print(preds[0], '(%f to %f)' % (low, high))
```### Ridge Regression
Fit a ridge regression model with the regularization parameter *exactly* set so as to minimize mean squared error on a leave-one-out cross-validation of the training data set
```python
# load example data set
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
X, y = load_boston(return_X_y=True)
X = StandardScaler().fit_transform(X)# fit model
from bbai.glm import RidgeRegression
model = RidgeRegression()
model.fit(X, y)
```### Logistic Regression
Fit a logistic regression model with the regularization parameter *exactly* set so as to maximize likelihood on an approximate leave-one-out cross-validation of the training data set
```python
# load example data set
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
X, y = load_breast_cancer(return_X_y=True)
X = StandardScaler().fit_transform(X)# fit model
from bbai.glm import LogisticRegression
model = LogisticRegression()
model.fit(X, y)
```### Bayesian Ridge Regression
Fit a Bayesian ridge regression model where the hyperparameter controlling the regularization strength is integrated over.
```python
# load example data set
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
X, y = load_boston(return_X_y=True)
X = StandardScaler().fit_transform(X)# fit model
from bbai.glm import BayesianRidgeRegression
model = BayesianRidgeRegression()
model.fit(X, y)
```### Logistic Regression MAP with Jeffreys Prior
Fit a logistic regression MAP model with Jeffreys prior.
```python
# load example data set
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
X, y = load_breast_cancer(return_X_y=True)
X = StandardScaler().fit_transform(X)# fit model
from bbai.glm import LogisticRegressionMAP
model = LogisticRegressionMAP()
model.fit(X, y)
```## How it works
* [Deterministic Objective Bayesian Inference for Spatial Models](https://buildingblock.ai/bayesian-gaussian-process.pdf)
* [Optimizing Approximate Leave-one-out Cross-validation to Tune Hyperparameters](https://arxiv.org/abs/2011.10218)
* [An Algorithm for Bayesian Ridge Regression with Full Hyperparameter Integration](https://buildingblock.ai/bayesian-ridge-regression)
* [How to Fit Logistic Regression with a Noninformative Prior](https://buildingblock.ai/logistic-regression-jeffreys)## Examples
* [01-digits](https://buildingblock.ai/multinomial-logistic-regression-example): Fit a multinomial logistic regression model to predict digits.
* [02-iris](example/02-iris.py): Fit a multinomial logistic regression model to the Iris data set.
* [03-bayesian](example/03-bayesian.py): Fit a Bayesian ridge regression model with hyperparameter integration.
* [04-curve-fitting](example/04-curve-fitting.ipynb): Fit a Bayesian ridge regression model with hyperparameter integration.
* [05-jeffreys1](example/05-jeffreys1.ipynb): Fit a logistic regression MAP model with Jeffreys prior and a single regressor.
* [06-jeffreys2](example/06-jeffreys2.ipynb): Fit a logistic regression MAP model with Jeffreys prior and two regressors.
* [07-jeffreys-breast-cancer](example/07-jeffreys-breast-cancer.py): Fit a logistic regression MAP model with Jeffreys prior to the breast cancer data set.
* [08-soil-cn](example/08-soil-cn.ipynb): Fit a Bayesian Gaussian process with a non-informative prior to a data set of soil carbon-to-nitrogen samples.
* [11-meuse-zinc](example/11-meuse-zinc.ipynb): Fit a Bayesian Gaussian process with a non-informative prior to a data set of zinc concetrations taken in a flood plain of the Meuse river.
* [13-sparse-grid](example/13-sparse-grid.ipynb): Build adaptive sparse grids for interpolation and integration.## Documentation
* [buildingblock.ai](https://buildingblock.ai/)
* [Getting Started](https://buildingblock.ai/get-started)
* [Logistic Regression Guide](https://buildingblock.ai/logistic-regression-guide)
* [Reference](https://buildingblock.ai/bbai.glm)