{"id":24702245,"url":"https://github.com/piEsposito/blitz-bayesian-deep-learning","last_synced_at":"2025-10-09T09:30:24.647Z","repository":{"id":40996519,"uuid":"243806106","full_name":"piEsposito/blitz-bayesian-deep-learning","owner":"piEsposito","description":"A simple and extensible library to create Bayesian Neural Network layers on PyTorch.","archived":false,"fork":false,"pushed_at":"2023-09-25T13:52:21.000Z","size":332,"stargazers_count":941,"open_issues_count":25,"forks_count":106,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-01-24T21:06:25.475Z","etag":null,"topics":["bayesian-deep-learning","bayesian-layers","bayesian-neural-networks","pytorch","pytorch-implementation","pytorch-tutorial"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/piEsposito.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":null,"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":"https://www.buymeacoffee.com/piEsposito"}},"created_at":"2020-02-28T16:26:38.000Z","updated_at":"2025-01-24T15:27:24.000Z","dependencies_parsed_at":"2024-06-18T18:37:12.414Z","dependency_job_id":null,"html_url":"https://github.com/piEsposito/blitz-bayesian-deep-learning","commit_stats":{"total_commits":185,"total_committers":17,"mean_commits":"10.882352941176471","dds":"0.42162162162162165","last_synced_commit":"5af11742484852c8bf69ad6fef27c230a2a0ecc2"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piEsposito%2Fblitz-bayesian-deep-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piEsposito%2Fblitz-bayesian-deep-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piEsposito%2Fblitz-bayesian-deep-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piEsposito%2Fblitz-bayesian-deep-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/piEsposito","download_url":"https://codeload.github.com/piEsposito/blitz-bayesian-deep-learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235807662,"owners_count":19047985,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-deep-learning","bayesian-layers","bayesian-neural-networks","pytorch","pytorch-implementation","pytorch-tutorial"],"created_at":"2025-01-27T05:39:36.777Z","updated_at":"2025-10-09T09:30:19.298Z","avatar_url":"https://github.com/piEsposito.png","language":"Python","funding_links":["https://www.buymeacoffee.com/piEsposito"],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"readme":"# Blitz - Bayesian Layers in Torch Zoo\n\n[![Downloads](https://pepy.tech/badge/blitz-bayesian-pytorch)](https://pepy.tech/project/blitz-bayesian-pytorch)\n\nBLiTZ is a simple and extensible library to create Bayesian Neural Network Layers (based on whats proposed in [Weight Uncertainty in Neural Networks paper](https://arxiv.org/abs/1505.05424)) on PyTorch. By using BLiTZ layers and utils, you can add uncertanity and gather the complexity cost of your model in a simple way that does not affect the interaction between your layers, as if you were using standard PyTorch.\n\nBy using our core weight sampler classes, you can extend and improve this library to add uncertanity to a bigger scope of layers as you will in a well-integrated to PyTorch way. Also pull requests are welcome.\n\n \n# Index\n * [Install](#Install)\n * [Documentation](#Documentation)\n * [A simple example for regression](#A-simple-example-for-regression)\n   * [Importing the necessary modules](#Importing-the-necessary-modules)\n   * [Loading and scaling data](#Loading-and-scaling-data)\n   * [Creating our variational regressor class](#Creating-our-variational-regressor-class)\n   * [Defining a confidence interval evaluating function](#Defining-a-confidence-interval-evaluating-function)\n   * [Creating our regressor and loading data](#Creating-our-regressor-and-loading-data)\n   * [Our main training and evaluating loop](#Our-main-training-and-evaluating-loop)\n * [Bayesian Deep Learning in a Nutshell](#Bayesian-Deep-Learning-in-a-Nutshell)\n   * [First of all, a deterministic NN layer linear-transformation](#First-of-all,-a-deterministic-NN-layer-linear-transformation)\n   * [The purpose of Bayesian Layers](#The-purpose-of-Bayesian-Layers)\n   * [Weight sampling on Bayesian Layers](#Weight-sampling-on-Bayesian-Layers)\n   * [It is possible to optimize our trainable weights](#It-is-possible-to-optimize-our-trainable-weights)\n   * [It is also true that there is complexity cost function differentiable along its variables](#It-is-also-true-that-there-is-complexity-cost-function-differentiable-along-its-variables)\n   * [To get the whole cost function at the nth sample](#To-get-the-whole-cost-function-at-the-nth-sample)\n   * [Some notes and wrap up](#Some-notes-and-wrap-up)\n * [Citing](#Citing)\n * [References](#References)\n   \n   \n## Install\n\nTo install BLiTZ you can use pip command:\n\n```\npip install blitz-bayesian-pytorch\n```\nOr, via conda:\n\n```\nconda install -c conda-forge blitz-bayesian-pytorch\n```\n\nYou can also git-clone it and pip-install it locally:\n\n```\nconda create -n blitz python=3.9\nconda activate blitz\ngit clone https://github.com/piEsposito/blitz-bayesian-deep-learning.git\ncd blitz-bayesian-deep-learning\npip install .\n```\n\n## Documentation\n\nDocumentation for our layers, weight (and prior distribution) sampler and utils:\n * [Bayesian Layers](doc/layers.md)\n * [Weight and prior distribution samplers](doc/samplers.md)\n * [Utils (for easy integration with PyTorch)](doc/utils.md)\n * [Losses](doc/losses.md)\n\n## A simple example for regression\n\n(You can see it for your self by running [this example](blitz/examples/bayesian_regression_boston.py) on your machine).\n\nWe will now see how can Bayesian Deep Learning be used for regression in order to gather confidence interval over our datapoint rather than a pontual continuous value prediction. Gathering a confidence interval for your prediction may be even a more useful information than a low-error estimation. \n\nI sustain my argumentation on the fact that, with good/high prob a confidence interval, you can make a more reliable decision than with a very proximal estimation on some contexts: if you are trying to get profit from a trading operation, for example, having a good confidence interval may lead you to know if, at least, the value on which the operation wil procees will be lower (or higher) than some determinate X.\n\nKnowing if a value will be, surely (or with good probability) on a determinate interval can help people on sensible decision more than a very proximal estimation that, if lower or higher than some limit value, may cause loss on a transaction. The point is that, sometimes, knowing if there will be profit may be more useful than measuring it.\n\nIn order to demonstrate that, we will create a Bayesian Neural Network Regressor for the Boston-house-data toy dataset, trying to create confidence interval (CI) for the houses of which the price we are trying to predict. We will perform some scaling and the CI will be about 75%. It will be interesting to see that about 90% of the CIs predicted are lower than the high limit OR (inclusive) higher than the lower one.\n\n## Importing the necessary modules\nDespite from the known modules, we will bring from BLiTZ athe `variational_estimator`decorator, which helps us to handle the BayesianLinear layers on the module keeping it fully integrated with the rest of Torch, and, of course, `BayesianLinear`, which is our layer that features weight uncertanity.\n\n```python\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nimport numpy as np\n\nfrom blitz.modules import BayesianLinear\nfrom blitz.utils import variational_estimator\n\nfrom sklearn.datasets import load_boston\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.model_selection import train_test_split\n```\n\n## Loading and scaling data\n\nNothing new under the sun here, we are importing and standard-scaling the data to help with the training.\n\n```python\nX, y = load_boston(return_X_y=True)\nX = StandardScaler().fit_transform(X)\ny = StandardScaler().fit_transform(np.expand_dims(y, -1))\n\nX_train, X_test, y_train, y_test = train_test_split(X,\n                                                    y,\n                                                    test_size=.25,\n                                                    random_state=42)\n\n\nX_train, y_train = torch.tensor(X_train).float(), torch.tensor(y_train).float()\nX_test, y_test = torch.tensor(X_test).float(), torch.tensor(y_test).float()\n```\n\n# Creating our variational regressor class\n\nWe can create our class with inhreiting from nn.Module, as we would do with any Torch network. Our decorator introduces the methods to handle the bayesian features, as calculating the complexity cost of the Bayesian Layers and doing many feedforwards (sampling different weights on each one) in order to sample our loss.\n\n```python\n@variational_estimator\nclass BayesianRegressor(nn.Module):\n    def __init__(self, input_dim, output_dim):\n        super().__init__()\n        #self.linear = nn.Linear(input_dim, output_dim)\n        self.blinear1 = BayesianLinear(input_dim, 512)\n        self.blinear2 = BayesianLinear(512, output_dim)\n        \n    def forward(self, x):\n        x_ = self.blinear1(x)\n        x_ = F.relu(x_)\n        return self.blinear2(x_)\n```\n\n# Defining a confidence interval evaluating function\n\nThis function does create a confidence interval for each prediction on the batch on which we are trying to sample the label value. We then can measure the accuracy of our predictions by seeking how much of the prediciton distributions did actually include the correct label for the datapoint.\n\n\n```python\ndef evaluate_regression(regressor,\n                        X,\n                        y,\n                        samples = 100,\n                        std_multiplier = 2):\n    preds = [regressor(X) for i in range(samples)]\n    preds = torch.stack(preds)\n    means = preds.mean(axis=0)\n    stds = preds.std(axis=0)\n    ci_upper = means + (std_multiplier * stds)\n    ci_lower = means - (std_multiplier * stds)\n    ic_acc = (ci_lower \u003c= y) * (ci_upper \u003e= y)\n    ic_acc = ic_acc.float().mean()\n    return ic_acc, (ci_upper \u003e= y).float().mean(), (ci_lower \u003c= y).float().mean()\n```\n\n# Creating our regressor and loading data\n\nNotice here that we create our `BayesianRegressor` as we would do with other neural networks.\n\n```python\nregressor = BayesianRegressor(13, 1)\noptimizer = optim.Adam(regressor.parameters(), lr=0.01)\ncriterion = torch.nn.MSELoss()\n\nds_train = torch.utils.data.TensorDataset(X_train, y_train)\ndataloader_train = torch.utils.data.DataLoader(ds_train, batch_size=16, shuffle=True)\n\nds_test = torch.utils.data.TensorDataset(X_test, y_test)\ndataloader_test = torch.utils.data.DataLoader(ds_test, batch_size=16, shuffle=True)\n```\n\n## Our main training and evaluating loop\n\nWe do a training loop that only differs from a common torch training by having its loss sampled by its sample_elbo method. All the other stuff can be done normally, as our purpose with BLiTZ is to ease your life on iterating on your data with different Bayesian NNs without trouble.\n\nHere is our very simple training loop:\n\n```python\niteration = 0\nfor epoch in range(100):\n    for i, (datapoints, labels) in enumerate(dataloader_train):\n        optimizer.zero_grad()\n        \n        loss = regressor.sample_elbo(inputs=datapoints,\n                           labels=labels,\n                           criterion=criterion,\n                           sample_nbr=3)\n        loss.backward()\n        optimizer.step()\n        \n        iteration += 1\n        if iteration%100==0:\n            ic_acc, under_ci_upper, over_ci_lower = evaluate_regression(regressor,\n                                                                        X_test,\n                                                                        y_test,\n                                                                        samples=25,\n                                                                        std_multiplier=3)\n            \n            print(\"CI acc: {:.2f}, CI upper acc: {:.2f}, CI lower acc: {:.2f}\".format(ic_acc, under_ci_upper, over_ci_lower))\n            print(\"Loss: {:.4f}\".format(loss))\n```\n\n## Bayesian Deep Learning in a Nutshell\nA very fast explanation of how is uncertainity introduced in Bayesian Neural Networks and how we model its loss in order to objectively improve the confidence over its prediction and reduce the variance without dropout. \n\n## First of all, a deterministic NN layer linear transformation\n\nAs we know, on deterministic (non bayesian) neural network layers, the trainable parameters correspond directly to the weights used on its linear transformation of the previous one (or the input, if it is the case). It corresponds to the following equation:\n\n\n![equation](https://latex.codecogs.com/gif.latex?a^{(i\u0026plus;1)}\u0026space;=\u0026space;W^{(i\u0026plus;1)}\\cdot\u0026space;z^{(i)}\u0026space;\u0026plus;\u0026space;b^{(i\u0026plus;1)}) \n\n*(Z correspond to the activated-output of the layer i)*\n\n## The purpose of Bayesian Layers\n\nBayesian layers seek to introduce uncertainity on its weights by sampling them from a distribution parametrized by trainable variables on each feedforward operation. \n\nThis allows we not just to optimize the performance metrics of the model, but also gather the uncertainity of the network predictions over a specific datapoint (by sampling it much times and measuring the dispersion) and aimingly reduce as much as possible the variance of the network over the prediction, making possible to know how much of incertainity we still have over the label if we try to model it in function of our specific datapoint.\n\n## Weight sampling on Bayesian Layers\nTo do so, on each feedforward operation we sample the parameters of the linear transformation with the following equations (where **ρ** parametrizes the standard deviation and **μ** parametrizes the mean for the samples linear transformation parameters) :\n\nFor the weights:\n\n![equation](https://latex.codecogs.com/gif.latex?W^{(i)}_{(n)}\u0026space;=\u0026space;\\mathcal{N}(0,1)\u0026space;*\u0026space;log(1\u0026space;\u0026plus;\u0026space;\\rho^{(i)}\u0026space;)\u0026space;\u0026plus;\u0026space;\\mu^{(i)})\n\n*Where the sampled W corresponds to the weights used on the linear transformation for the ith layer on the nth sample.*\n\nFor the biases:\n\n![equation](https://latex.codecogs.com/gif.latex?b^{(i)}_{(n)}\u0026space;=\u0026space;\\mathcal{N}(0,1)\u0026space;*\u0026space;log(1\u0026space;\u0026plus;\u0026space;\\rho^{(i)}\u0026space;)\u0026space;\u0026plus;\u0026space;\\mu^{(i)})\n\n*Where the sampled b corresponds to the biases used on the linear transformation for the ith layer on the nth sample.*\n\n## It is possible to optimize our trainable weights\n\nEven tough we have a random multiplier for our weights and biases, it is possible to optimize them by, given some differentiable function of the weights sampled and trainable parameters (in our case, the loss), summing the derivative of the function relative to both of them:\n\n1. Let ![equation](https://latex.codecogs.com/gif.latex?\\epsilon\u0026space;=\u0026space;\\mathcal{N}(0,1))\n2. Let ![equation](https://latex.codecogs.com/gif.latex?\\theta\u0026space;=\u0026space;(\\rho,\u0026space;\\mu))\n3. Let ![equation](https://latex.codecogs.com/gif.latex?w\u0026space;=\u0026space;\\mu\u0026space;\u0026plus;\u0026space;\\log({1\u0026space;\u0026plus;\u0026space;e^{\\rho}})\u0026space;*\u0026space;\\epsilon)\n4. Let ![equation](https://latex.codecogs.com/gif.latex?f(w,\u0026space;\\theta)) be differentiable relative to its variables\n\nTherefore:\n\n5. ![equation](https://latex.codecogs.com/gif.latex?\\Delta_{\\mu}\u0026space;=\u0026space;\\frac{\\delta\u0026space;f(w,\u0026space;\\theta)}{\\delta\u0026space;w}\u0026space;\u0026plus;\u0026space;\\frac{\\delta\u0026space;f(w,\u0026space;\\theta)}{\\delta\u0026space;\\mu})\n\nand\n\n\n6. ![equation](https://latex.codecogs.com/gif.latex?\\Delta_{\\rho}\u0026space;=\u0026space;\\frac{\\delta\u0026space;f(w,\u0026space;\\theta)}{\\delta\u0026space;w}\u0026space;\\frac{\\epsilon}{1\u0026space;\u0026plus;\u0026space;e^\\rho\u0026space;}\u0026space;\u0026plus;\u0026space;\\frac{\\delta\u0026space;f(w,\u0026space;\\theta)}{\\delta\u0026space;\\rho})\n\n## It is also true that there is complexity cost function differentiable along its variables\n\nIt is known that the crossentropy loss (and MSE) are differentiable. Therefore if we prove that there is a complexity-cost function that is differentiable, we can leave it to our framework take the derivatives and compute the gradients on the optimization step.\n\n**The complexity cost is calculated, on the feedforward operation, by each of the Bayesian Layers, (with the layers pre-defined-simpler apriori distribution and its empirical distribution). The sum of the complexity cost of each layer is summed to the loss.**\n\nAs proposed in [Weight Uncertainty in Neural Networks paper](https://arxiv.org/abs/1505.05424), we can gather the complexity cost of a distribution by taking the [Kullback-Leibler Divergence](https://jhui.github.io/2017/01/05/Deep-learning-Information-theory/) from it to a much simpler distribution, and by making some approximation, we will can differentiate this function relative to its variables (the distributions):\n\n1. Let ![equation](https://latex.codecogs.com/gif.latex?{P}(w)) be a low-entropy distribution pdf set by hand, which will be assumed as an \"a priori\" distribution for the weights\n\n2. Let ![equation](https://latex.codecogs.com/gif.latex?{Q}(w\u0026space;|\u0026space;\\theta)) be the a posteriori empirical distribution pdf for our sampled weights, given its parameters.\n\n\n\n\nTherefore, for each scalar on the W sampled matrix:\n\n\n\n\n3. ![equation](https://latex.codecogs.com/gif.latex?{D}_{KL}(\u0026space;{Q}(w\u0026space;|\u0026space;\\theta)\u0026space;\\lVert\u0026space;{P}(w)\u0026space;)\u0026space;=\u0026space;\\lim_{n\\to\\infty}1/n\\sum_{i=0}^{n}\u0026space;{Q}(w^{(i)}\u0026space;|\u0026space;\\theta)*\u0026space;(\\log{{Q}(w^{(i)}\u0026space;|\u0026space;\\theta)}\u0026space;-\u0026space;\\log{{P}(w^{(i)})}\u0026space;))\n\n\nBy assuming a very large n, we could approximate:\n\n4. ![equation](https://latex.codecogs.com/gif.latex?{D}_{KL}(\u0026space;{Q}(w\u0026space;|\u0026space;\\theta)\u0026space;\\lVert\u0026space;{P}(w)\u0026space;)\u0026space;=\u0026space;1/n\\sum_{i=0}^{n}\u0026space;{Q}(w^{(i)}\u0026space;|\u0026space;\\theta)*\u0026space;(\\log{{Q}(w^{(i)}\u0026space;|\u0026space;\\theta)}\u0026space;-\u0026space;\\log{{P}(w^{(i)})}\u0026space;))\n\n\nand therefore:\n\n\n5. ![equation](https://latex.codecogs.com/gif.latex?{D}_{KL}(\u0026space;{Q}(w\u0026space;|\u0026space;\\theta)\u0026space;\\lVert\u0026space;{P}(w)\u0026space;)\u0026space;=\u0026space;\\mu_Q\u0026space;*\\sum_{i=0}^{n}\u0026space;(\\log{{Q}(w^{(i)}\u0026space;|\u0026space;\\theta)}\u0026space;-\u0026space;\\log{{P}(w^{(i)})}\u0026space;))\n\n\nAs the expected (mean) of the Q distribution ends up by just scaling the values, we can take it out of the equation (as there will be no framework-tracing). Have a complexity cost of the nth sample as:\n\n6. ![equation](https://latex.codecogs.com/gif.latex?{C^{(n)}\u0026space;(w^{(n)},\u0026space;\\theta)\u0026space;}\u0026space;=\u0026space;(\\log{{Q}(w^{(n)}\u0026space;|\u0026space;\\theta)}\u0026space;-\u0026space;\\log{{P}(w^{(n)})}\u0026space;))\n\nWhich is differentiable relative to all of its parameters. \n\n## To get the whole cost function at the nth sample:\n\n1. Let a performance (fit to data) function be: ![equation](https://latex.codecogs.com/gif.latex?{P^{(n)}\u0026space;(w^{(n)},\u0026space;\\theta)})\n\n\nTherefore the whole cost function on the nth sample of weights will be:\n\n2. ![equation](https://latex.codecogs.com/gif.latex?{L^{(n)}\u0026space;(w^{(n)},\u0026space;\\theta)\u0026space;}\u0026space;=\u0026space;{C^{(n)}\u0026space;(w^{(n)},\u0026space;\\theta)\u0026space;}\u0026space;\u0026plus;\u0026space;{P^{(n)}\u0026space;(w^{(n)},\u0026space;\\theta)\u0026space;})\n\nWe can estimate the true full Cost function by Monte Carlo sampling it (feedforwarding the netwok X times and taking the mean over full loss) and then backpropagate using our estimated value. It works for a low number of experiments per backprop and even for unitary experiments.\n\n## Some notes and wrap up\nWe came to the end of a Bayesian Deep Learning in a Nutshell tutorial. By knowing what is being done here, you can implement your bnn model as you wish. \n\nMaybe you can optimize by doing one optimize step per sample, or by using this Monte-Carlo-ish method to gather the loss some times, take its mean and then optimizer. Your move.\n\nFYI: **Our Bayesian Layers and utils help to calculate the complexity cost along the layers on each feedforward operation, so don't mind it to much.**\n \n## References:\n * [Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424, 2015.](https://arxiv.org/abs/1505.05424)\n \n \n## Citing\n\nIf you use `BLiTZ` in your research, you can cite it as follows:\n\n```bibtex\n@misc{esposito2020blitzbdl,\n    author = {Piero Esposito},\n    title = {BLiTZ - Bayesian Layers in Torch Zoo (a Bayesian Deep Learing library for Torch)},\n    year = {2020},\n    publisher = {GitHub},\n    journal = {GitHub repository},\n    howpublished = {\\url{https://github.com/piEsposito/blitz-bayesian-deep-learning/}},\n}\n```\n \n###### Made by Pi Esposito\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FpiEsposito%2Fblitz-bayesian-deep-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FpiEsposito%2Fblitz-bayesian-deep-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FpiEsposito%2Fblitz-bayesian-deep-learning/lists"}