{"id":22115323,"url":"https://github.com/nolanbconaway/shabadoo","last_synced_at":"2025-08-25T00:31:49.292Z","repository":{"id":55512698,"uuid":"239387344","full_name":"nolanbconaway/shabadoo","owner":"nolanbconaway","description":"Very easy Bayesian regression.","archived":false,"fork":false,"pushed_at":"2020-12-25T15:08:07.000Z","size":1849,"stargazers_count":3,"open_issues_count":6,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-12-15T03:24:04.273Z","etag":null,"topics":["bayesian-inference","bayesian-statistics","jax","mcmc","numpyro","python","regression"],"latest_commit_sha":null,"homepage":"https://nolanbconaway.github.io/shabadoo/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nolanbconaway.png","metadata":{"files":{"readme":"readme.md","changelog":"changelog.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-09T22:58:02.000Z","updated_at":"2021-04-21T20:09:34.000Z","dependencies_parsed_at":"2022-08-15T02:10:25.336Z","dependency_job_id":null,"html_url":"https://github.com/nolanbconaway/shabadoo","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nolanbconaway%2Fshabadoo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nolanbconaway%2Fshabadoo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nolanbconaway%2Fshabadoo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nolanbconaway%2Fshabadoo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nolanbconaway","download_url":"https://codeload.github.com/nolanbconaway/shabadoo/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230852592,"owners_count":18290081,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-inference","bayesian-statistics","jax","mcmc","numpyro","python","regression"],"created_at":"2024-12-01T12:15:25.300Z","updated_at":"2024-12-22T15:49:11.680Z","avatar_url":"https://github.com/nolanbconaway.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Shabadoo: very easy Bayesian regression.\n\n\u003e![Imgur](https://i.imgur.com/yScWnEt.jpg)\n\u003e\n\u003e \"That's the worst name I ever heard.\"\n\n[![badge](https://github.com/nolanbconaway/shabadoo/workflows/Lint%20and%20Test/badge.svg)](https://github.com/nolanbconaway/shabadoo/actions?query=workflow%3A%22Lint+and+Test%22)\n[![badge](https://github.com/nolanbconaway/shabadoo/workflows/Scheduled%20Testing/badge.svg)](https://github.com/nolanbconaway/shabadoo/actions?query=workflow%3A%22Scheduled+Testing%22)\n[![codecov](https://codecov.io/gh/nolanbconaway/shabadoo/branch/master/graph/badge.svg?token=gIubsLSSHH)](https://codecov.io/gh/nolanbconaway/shabadoo)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/shabadoo)](https://pypi.org/project/shabadoo/)\n[![PyPI](https://img.shields.io/pypi/v/shabadoo)](https://pypi.org/project/shabadoo/)\n\nShabadoo is the worst kind of machine learning. It automates nothing; your models will not perform well and it will be your own fault. \n\n\u003e **BEWARE**. Shabadoo is in an open alpha phase. It is authored by someone who does not know how to manage open source projects. Things will change as the author identifies mistakes and corrects (?) them.\n\nShabadoo is for people who want to do Bayesian regression but who do not want to write probabilistic programming code. You only need to assign priors to features and pass your pandas dataframe to a `.fit()` / `.predict()` API.\n\nShabadoo runs on [numpyro](http://num.pyro.ai/) and is basically a wrapper around the [numpyro Bayesian regression tutorial](https://pyro.ai/numpyro/bayesian_regression.html).\n\n- [Quickstart](#quickstart)\n  - [Install](#install)\n  - [Specifying a Shabadoo Bayesian model](#specifying-a-shabadoo-bayesian-model)\n  - [Fitting \u0026 predicting the model](#fitting--predicting-the-model)\n  - [Inspecting the model](#inspecting-the-model)\n  - [Saving and recovering a saved model](#saving-and-recovering-a-saved-model)\n- [Development](#development)\n\n## Quickstart\n\n### Install\n\n```sh\npip install shabadoo\n```\n\nor\n\n```sh\npip install git+https://github.com/nolanbconaway/shabadoo\n```\n\n### Specifying a Shabadoo Bayesian model\n\nShabadoo was designed to make it as easy as possible to test ideas about features and their priors. Models are defined using a class which contains configuration specifying how the model should behave.\n\nYou need to define a new class which inherits from one of the Shabadoo models. Currently, Normal, Poisson, and Bernoulli are implemented.\n\n```python\nimport numpy as np\nimport pandas as pd\nfrom numpyro import distributions as dist\nfrom shabadoo import Normal\n\n\n# random number generator seed, to reproduce exactly.\nRNG_KEY = np.array([0, 0])\n\nclass Model(Normal):\n    dv = \"y\"\n    features = dict(\n        const=dict(transformer=1, prior=dist.Normal(0, 1)),\n        x=dict(transformer=lambda df: df.x, prior=dist.Normal(0, 1)),\n    )\n\n\ndf = pd.DataFrame(dict(x=[1, 2, 2, 3, 4, 5], y=[1, 2, 3, 4, 3, 5]))\n```\n\nThe `dv` attribute specifies the variable you are predicting. `features` is a dictionary of dictionaries, with one item per feature. Above, two features are defined (`const` and `x`). Each feature needs a `transformer` and a `prior`. \n\nThe transformer specifies how to obtain the feature given a source dataframe. The prior specifies your beliefs about the model's coefficient for that feature.\n\n### Fitting \u0026 predicting the model\n\nShabadoo models implement the well-known `.fit` / `.predict` api pattern.\n\n```python\nmodel = Model().fit(df, rng_key=RNG_KEY)\n# sample: 100%|██████████| 1500/1500 [00:04\u003c00:00, 308.01it/s, 7 steps of size 4.17e-01. acc. prob=0.89]\n\nmodel.predict(df)\n\n\"\"\"\n0    1.351874\n1    2.219510\n2    2.219510\n3    3.087146\n4    3.954782\n5    4.822418\n\"\"\"\n```\n\n#### Credible Intervals\n\nUse `model.predict(df, ci=True)` to obtain a credible interval around the model's prediction. This interval accounts for error estimating the model's coefficients but does not account for the error around the model's point estimate (PRs welcome ya'll!).\n\n```python\nmodel.predict(df, ci=True)\n\n\"\"\"\n          y  ci_lower  ci_upper\n0  1.351874  0.730992  1.946659\n1  2.219510  1.753340  2.654678\n2  2.219510  1.753340  2.654678\n3  3.087146  2.663617  3.526434\n4  3.954782  3.401837  4.548420\n5  4.822418  4.047847  5.578753\n\"\"\"\n```\n\n### Inspecting the model\n\nShabadoo's model classes come with a number of model inspection methods. It should be easy to understand your model's composition and with Shabadoo it is!\n\n#### Print the model formula\n\nThe average and standard deviation of the MCMC samples are used to provide a rough sense of the coefficient in general.\n\n```python\nprint(model.formula)\n\n\"\"\"\ny = (\n    const * 0.48424(+-0.64618)\n  + x * 0.86764(+-0.21281)\n)\n\"\"\"\n```\n\n#### Look at the posterior samples\n\nSamples from fitted models can be accessed using `model.samples` (for raw device arrays) and `model.samples_df` (for a tidy DataFrame).\n\n\n```python\nmodel.samples['x']\n\"\"\"\nDeviceArray([[0.9443443 , 1.0215557 , 1.0401363 , 1.1768144 , 1.1752374 ,\n...\n\"\"\"\n\nmodel.samples_df.head()\n\"\"\"\n                 const         x\nchain sample                    \n0     0       0.074572  0.944344\n      1       0.214246  1.021556\n      2      -0.172168  1.040136\n      3       0.440978  1.176814\n      4       0.454463  1.175237\n\"\"\"\n```\n\n#### Measure prediction accuracy\n\nThe `Model.metrics()` method is packed with functionality. You should not have to write a lot of code to evaluate your model's prediction accuracy!\n\nObtaining aggregate statistics is as easy as:\n\n```python\nmodel.metrics(df)\n\n{'r': 0.8646920305474705,\n 'rsq': 0.7476923076923075,\n 'mae': 0.5661819464378061,\n 'mape': 0.21729708806356265}\n```\n\nFor per-point errors, use `aggerrs=False`. A pandas dataframe will be returned that you can join on your source data using its index.\n\n```python\nmodel.metrics(df, aggerrs=False)\n\n\"\"\"\n   residual         pe        ape\n0 -0.351874 -35.187366  35.187366\n1 -0.219510 -10.975488  10.975488\n2  0.780490  26.016341  26.016341\n3  0.912854  22.821353  22.821353\n4 -0.954782 -31.826066  31.826066\n5  0.177582   3.551638   3.551638\n\"\"\"\n```\n\nYou can use `grouped_metrics` to understand within-group errors. Under the hood, the predicted and actual `dv` are groupby-aggregated (default sum) and metrics are computed within each group.\n\n```python\ndf[\"group\"] = [1, 1, 1, 2, 2, 2]\nmodel.grouped_metrics(df, 'group')\n\n{'r': 1.0,\n 'rsq': 1.0,\n 'mae': 0.17238043177407247,\n 'mape': 0.023077819594065668}\n```\n\n```python\nmodel.grouped_metrics(df, \"group\", aggerrs=False)\n\n\"\"\"\n       residual        pe       ape\ngroup                              \n1     -0.209107 -3.485113  3.485113\n2     -0.135654 -1.130450  1.130450\n\"\"\"\n```\n\n### Saving and recovering a saved model\n\nShabadoo models have `to_json` and `from_dict` methods which allow models to be saved and recovered exactly. \n\n```python\nimport json\n\n# export to a JSON string\nmodel_json = model.to_json()\n\n# recover the model\nmodel_recovered = Model.from_dict(json.loads(model_json))\n\n# check the predictions are the same\nmodel_recovered.predict(df).equals(model.predict(df))\nTrue\n```\n\n## Development\n\nTo get a development installation going, set up a python 3.6 or 3.7 virtualenv however you'd like and set up an editable installation of Shabadoo like so:\n\n```sh\n$ git clone https://github.com/nolanbconaway/shabadoo.git \n$ cd shabadoo\n$ pip install -e .[test]\n```\n\nYou should be able to run the full test suite via:\n\n```sh\n$ tox -e py36  # or py37 if thats what you installed\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnolanbconaway%2Fshabadoo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnolanbconaway%2Fshabadoo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnolanbconaway%2Fshabadoo/lists"}