{"id":32208274,"url":"https://github.com/furrer-lab/abn","last_synced_at":"2025-10-22T06:02:11.042Z","repository":{"id":210040866,"uuid":"721192865","full_name":"furrer-lab/abn","owner":"furrer-lab","description":"Bayesian network analysis in R","archived":false,"fork":false,"pushed_at":"2025-09-12T10:33:17.000Z","size":86859,"stargazers_count":8,"open_issues_count":40,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-10-16T20:21:44.322Z","etag":null,"topics":["bayesian-network","binomial","categorical-data","gaussian","grouped-datasets","mixed-effects","multinomial","multivariate","poisson","structure-learning"],"latest_commit_sha":null,"homepage":"https://r-bayesian-networks.org/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/furrer-lab.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-11-20T14:45:47.000Z","updated_at":"2025-09-11T16:39:58.000Z","dependencies_parsed_at":"2024-03-23T12:08:13.614Z","dependency_job_id":"2d570af3-e8cb-4e16-968f-a557d700d4d8","html_url":"https://github.com/furrer-lab/abn","commit_stats":null,"previous_names":["furrer-lab/abn"],"tags_count":28,"template":false,"template_full_name":null,"purl":"pkg:github/furrer-lab/abn","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/furrer-lab%2Fabn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/furrer-lab%2Fabn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/furrer-lab%2Fabn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/furrer-lab%2Fabn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/furrer-lab","download_url":"https://codeload.github.com/furrer-lab/abn/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/furrer-lab%2Fabn/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280179444,"owners_count":26285958,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-20T02:00:06.978Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-network","binomial","categorical-data","gaussian","grouped-datasets","mixed-effects","multinomial","multivariate","poisson","structure-learning"],"created_at":"2025-10-22T06:00:59.393Z","updated_at":"2025-10-22T06:02:06.315Z","avatar_url":"https://github.com/furrer-lab.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# abn: Additive Bayesian Networks \u003ca href=\"https://r-bayesian-networks.org/\"\u003e\u003cimg src=\"man/figures/logo.png\" align=\"right\" height=\"139\" /\u003e\u003c/a\u003e\n\n\u003c!-- badges: start --\u003e\n\u003c!-- WARNING: --\u003e\n\u003c!-- The ?branch=release-x.y.y is updated automatically by the initiate_version_release workflow --\u003e\n[![status](https://joss.theoj.org/papers/1bbc43a2be86f5d3f831cedb5cf81812/status.svg)](https://joss.theoj.org/papers/10.21105/joss.06822)\n[![On Label CRAN Checks](https://github.com/furrer-lab/abn/actions/workflows/onlabel_CRAN_checks.yml/badge.svg?branch=release-3.1.11)](https://github.com/furrer-lab/abn/actions/workflows/onlabel_CRAN_checks.yml)\n[![Codecov](https://img.shields.io/codecov/c/github/furrer-lab/abn)](https://app.codecov.io/gh/furrer-lab/abn)\n[![GitHub R package version](https://img.shields.io/github/r-package/v/furrer-lab/abn)](https://github.com/furrer-lab/abn/tags)\n![cran](https://www.r-pkg.org/badges/version-ago/abn) \n![downloads](https://cranlogs.r-pkg.org/badges/grand-total/abn) \n![LICENCE](https://img.shields.io/cran/l/abn)\n\u003c!-- badges: end --\u003e\n\nThe R package `abn` is a tool for Bayesian network analysis, a form of probabilistic graphical model.\nIt derives a directed acyclic graph (DAG) from empirical data that describes the dependency structure between random variables.\nThe package provides routines for structure learning and parameter estimation of additive Bayesian network models.  \n\n# Installation\n[![Ubuntu Install](https://github.com/furrer-lab/abn/actions/workflows/Ubuntu_setup.yml/badge.svg?branch=main)](https://github.com/furrer-lab/abn/actions/workflows/Ubuntu_setup.yml)\n[![Fedora Install](https://github.com/furrer-lab/abn/actions/workflows/Fedora_setup.yml/badge.svg?branch=main)](https://github.com/furrer-lab/abn/actions/workflows/Fedora_setup.yml)\n[![MacOS Install](https://github.com/furrer-lab/abn/actions/workflows/Macos_setup.yml/badge.svg?branch=main)](https://github.com/furrer-lab/abn/actions/workflows/Macos_setup.yml)\n[![Windows Install](https://github.com/furrer-lab/abn/actions/workflows/Windows_setup.yml/badge.svg?branch=main)](https://github.com/furrer-lab/abn/actions/workflows/Windows_setup.yml)\n\n`abn` and its installation process relies on various software that might, or might not, be present in your system.\n\n## Prior to installing\n\nIn order for `abn` to work correctly on your system some dependencies need to be installed.\nIf you are on a Linux based system (most of) these dependencies are installed automatically for you when following the [pak](https://pak.r-lib.org/)-based installation procedure described in the [Installing from GitHub](#installing-from-github-recommended) section.\n\nFor MacOS and Windows based system some more preparatory steps are required.\n\nThe following paragraphs provide detailed instructions for the most common operating systems on the steps that need to be carried out prior to installing `abn`.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e\u003ci\u003eUbuntu\u003c/i\u003e\u003c/b\u003e\u003c/summary\u003e\n\nYou presumably have R installed already, if not, open a terminal and type:\n\n```bash\napt-get install r-base\n```\n\n_**Note:** You might need to prepend `sudo ` to this command._\n\nAll you need for the installation is to have the R-package [pak](https://pak.r-lib.org/) installed.\n`pak` is installed like any other R-package, however, it relies on `curl` being present on your system, so we make sure it is there:\n\n  ```bash\n  apt-get install libcurl4-openssl-dev\n  ```\n  Now, to install `pak` we start an R session and write:\n\n  ```R\n  install.packages('pak', repos=c(CRAN=\"https://cran.r-project.org\"))\n  ```\n\n  With that you should be ready to [install `abn` from GitHub](#installing-from-github-recommended).\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e\u003ci\u003eFedora\u003c/i\u003e\u003c/b\u003e\u003c/summary\u003e\n\n  You presumably have R installed already, if not, open a terminal and type:\n \n  ```bash\n  dnf install R\n  ```\n\n  _**Note:** You might need to prepend `sudo ` to this command._\n\n  For the installation you need to have the R-package [pak](https://pak.r-lib.org/) installed.\n  `pak` is installed like any other R-package, however, it relies on `curl` being installed on your system, so we make sure it is there:\n\n  ```bash\n  dnf install libcurl-devel \n  ```\n  Now, to install `pak` we start an R session and write:\n\n  ```R\n  install.packages('pak', repos=c(CRAN=\"https://cran.r-project.org\"))\n  ```\n\n  There is one more thing we need to do before we can install `abn`:\n\n  **Install JAGS from source**\n\n  [JAGS](https://mcmc-jags.sourceforge.io/), _Just Another Gibbs Sampler_, is a program for analyzing Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation. [rjags](https://cran.r-project.org/package=rjags) is R's interface to the `JAGS` library.\n  `JAGS` is required in some simulations `abn` can perform.\n\n  The steps needed to install `JAGS 4.3.2` are:\n  \n  ```bash\n  wget -O /tmp/jags.tar.gz https://sourceforge.net/projects/mcmc-jags/files/JAGS/4.x/Source/JAGS-4.3.2.tar.gz/download\n  cd /tmp\n  tar -xf jags.tar.gz\n  cd /tmp/JAGS-4.3.2 \n  ./configure --libdir=/usr/local/lib64\n  make\n  sudo make install\n  ```\n  \n  _**Note:**_\n  _If you are on a 64bit system (you likely are) mind the `--libdir=/usr/local/lib64` argument when launching `./configure`.)_\n  _Omitting this argument will lead to `rjags` \"not seeing\" `jags`._\n  \n  On Fedora `rjags` might need some special configuration for it to link properly to the `JAGS` library.\n  Also, it might be needed to add the path to the `JAGS` library to the linker path (see [rjags INSTALL file](https://github.com/cran/rjags/blob/master/INSTALL) for further details).\n  \n  In order to add the `JAGS` library to the linker path, run the following commands:\n  \n  ```bash\n  sudo echo \"/usr/local/lib64\" \u003e /etc/ld.so.conf.d/jags.conf\n  sudo /sbin/ldconfig\n  ``` \n\n  _**Note:**_\n  _These commands might not be needed, you might first try to install the R-package `rjags` and only run them if you encounter a `configure: error: Runtime link error`._\n\n  With that you should be ready to [install `abn` from GitHub](#installing-from-github-recommended).\n  \n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e\u003ci\u003eMacOS\u003c/i\u003e\u003c/b\u003e\u003c/summary\u003e\n\n  Most likely you have R installed already but if not run:\n \n  ```bash\n  brew install R\n  ```\n\n  For the installation you need to have the R-package [pak](https://pak.r-lib.org/) installed.\n  `pak` is installed like any other R-package, we start an R session and write:\n\n  ```R\n  install.packages('pak', repos=c(CRAN=\"https://cran.r-project.org\"))\n  ```\n\n  We will install the system dependencies with [Homebrew](https://brew.sh/).\n  Head over to their site to see the installation process or simply open a terminal and run:\n\n  ```bash\n  /bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"\n  ```\n\n  To correctly link to installed libraries and to build them, we need `pkg-config` and `automake`:\n \n  ```bash\n  brew install pkg-config\n  brew install automake  # needed to run autoconf\n  ```\n  \n  We will use `wget` to download `JAGS` later, as well as, the development headers `openssl`:\n \n  ```bash\n  brew install wget\n  brew install openssl@1.1\n  ```\n\n  \u003cins\u003e**Dependencies**\u003c/ins\u003e\n  \n  On MacOS we need to install some system dependencies separately:\n  \n  \n  - **GSL**\n  \n    [GSL](https://www.gnu.org/software/gsl/), the _GNU Scientific Library_, is a numerical library for C/C++.\n    It is required to compile `abn`'s C/C++ code.\n  \n    With Homebrew you can install the `GSL` binaries directly:\n    \n    ```\n    brew install gsl\n    ```\n  \n  - **JAGS \u0026 rjags**\n    \n    [JAGS](https://mcmc-jags.sourceforge.io/), _Just Another Gibbs Sampler_, is a program for analyzing Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation. [rjags](https://cran.r-project.org/package=rjags) is R's interface to the `JAGS` library.\n    `JAGS` is required in some simulations `abn` can perform.\n  \n    With Homebrew you can install the `JAGS` binaries directly:\n    \n    ```\n    brew install jags\n    ```\n    \n    And now to install `rjags`, open an R session and type:\n    \n    ```R\n    install.packages(\"rjags\", type=\"source\", repos=c(CRAN=\"https://cran.r-project.org\"))\n    library(\"rjags\")\n    ```\n  \n  - **INLA**\n  \n    [INLA](https://www.r-inla.org/) is an R package that is not hosted on CRAN and thus needs to be installed separately.\n    `abn` uses `INLA` to fit some models. \n    \n    `INLA` relies on various other R packages and C/C++ libraries.\n    It thus needs some additional installation steps:\n    \n    ```bash\n    brew install udunits \n    brew install gdal  # installs also geos as dependency\n    brew install proj\n    ```\n  \n    Now, to install `INLA` itself, simply start an R session and run:\n    \n    ```R\n    install.packages(\"INLA\", repos = c(getOption(\"repos\"), INLA = \"https://inla.r-inla-download.org/R/stable\"), dep = TRUE)\n    ```\n    \n    If you run into trouble, please see also [INLA's installation instructions](https://www.r-inla.org/download-install) for further details.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e\u003ci\u003eWindows\u003c/i\u003e\u003c/b\u003e\u003c/summary\u003e\n\n  For the installation you need to have the R-package [pak](https://pak.r-lib.org/) installed.\n  `pak` is installed like any other R-package, we start an R session and write:\n\n  ```R\n  install.packages('pak', repos=c(CRAN=\"https://cran.r-project.org\"))\n  ```\n\n\n  \u003cins\u003e**Dependencies**\u003c/ins\u003e\n\n  On Windows we need to install some system dependencies separately:\n\n\n  - **GSL**\n  \n    [GSL](https://www.gnu.org/software/gsl/), the _GNU Scientific Library_, is a numerical library for C/C++.\n    It is required to compile `abn`'s C/C++ code.\n  \n    In Windows `GSL` is available a.o. through [cygwin](https://cygwin.com/index.html), which has a straight forward installation process.\n    Either head over to the website, download and install the `setup-x86_64.exe` file or use PowerShell:\n    \n    ```powershell\n    Import-Module bitstransfer\n    New-Item -ItemType Directory -Force -Path \"C:\\Program Files\\cygwin\"\n    start-bitstransfer -source https://cygwin.com/setup-x86_64.exe \"C:\\Program Files\\cygwin\\setup-x86_64.exe\"\n    Start-Process -Wait -FilePath \"C:\\Program Files\\cygwin\\setup-x86_64.exe\" -ArgumentList \"/S\" -PassThru\n    ```\n  \n  - **JAGS \u0026 rjags**\n    \n    [JAGS](https://mcmc-jags.sourceforge.io/), _Just Another Gibbs Sampler_, is a program for analyzing Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation. [rjags](https://cran.r-project.org/package=rjags) is R's interface to the `JAGS` library.\n    `JAGS` is required in some simulations `abn` can perform.\n  \n    \n    You can either head over to the [JAGS download page](https://sourceforge.net/projects/mcmc-jags/files/JAGS/4.x/Windows/), download and execute the installable, or use PowerShell.\n    The following instructions will download and install `JAGS 4.3.1` in PowerShell:\n    \n    ```powershell\n    Import-Module bitstransfer\n    New-Item -ItemType Directory -Force -Path \"C:\\Program Files\\JAGS\\JAGS-4.3.1\"\n    start-bitstransfer -source https://sourceforge.net/projects/mcmc-jags/files/JAGS/4.x/Windows/JAGS-4.3.1.exe/download  \"C:\\Program Files\\JAGS\\JAGS-4.3.1\\JAGS-4.3.1.exe\"\n    Start-Process -Wait -FilePath \"C:\\Program Files\\JAGS\\JAGS-4.3.1\\JAGS-4.3.1.exe\" -ArgumentList \"/S\" -PassThru\n    ```\n    \n    In order to make sure `rjags` finds `JAGS` we set the environment variable `JAGS_HOME` before installing `rjags`.\n    To do so, open your R session and type:\n    \n    ```R\n    Sys.setenv(JAGS_HOME=\"C:/Program Files/JAGS/JAGS-4.3.1\")\n    install.packages(\"rjags\", repos=c(CRAN=\"https://cran.r-project.org\"))\n    library(\"rjags\")\n    ```\n  \n  - **INLA**\n  \n    [INLA](https://www.r-inla.org/) is an R package that is not hosted on CRAN and thus needs to be installed separately.\n    `abn` uses `INLA` to fit some models. \n    \n    The installation is straight forward, simply start an R session and run:\n    \n    ```R\n    install.packages(\"INLA\", repos = c(getOption(\"repos\"), INLA = \"https://inla.r-inla-download.org/R/stable\"), dep = TRUE)\n    ```\n    \n    If you run into trouble, please see also [INLA's installation instructions](https://www.r-inla.org/download-install) for further details.\n  \n\u003c/details\u003e\n\n_Click on your operating system to see the specific installation instructions_\n\n## R version support\n\nOfficially supported is R version \u003e= 4.4\n\n## Installing from GitHub (recommended)\n\nFrom GitHub you can install any version and/or state of the `abn` repository you want.\nWe recommend to not directly install `main`, but to a specific version.\nHead over to our [version list](https://github.com/furrer-lab/abn/releases) to see which one is the latest version.\nHere we assume the version is `3.1.2`.\n\nWe use [pak](https://pak.r-lib.org/) for the installation process.\nIf you followed the [Prior to installing](#prior-to-installing) section `pak` should already be installed.\n\u003cdetails\u003e\u003csummary\u003eIf not, install it first.\u003c/summary\u003e Open an R session and type:\n\n```R\ninstall.packages('pak', repos=c(CRAN=\"https://cran.r-project.org\"))\n```\n\u003c/details\u003e\n\nTo install `abn` run in your R session:\n\n```R\npak::repo_add(INLA = \"https://inla.r-inla-download.org/R/stable/\")\npak::pkg_install(\"furrer-lab/abn@3.1.2\", dependencies=TRUE)\n```\n_**Note:** The first command can be skipped on MacOS or Windows._\n\n## Installing from CRAN\n\n\u003e [!NOTE]\n\u003e When installing from CRAN you might not get the latest version of `abn`.\n\u003e If you want the latest version follow the instructions from [Installing from GitHub](#installing-from-github-recommended).\n\nIn order to install the `abn` version on CRAN, open an R session and type:\n\n```R\npak::repo_add(INLA = \"https://inla.r-inla-download.org/R/stable/\")\npak::pkg_install(\"abn\", dependencies=TRUE)\n```\n_**Note:** The first command can be skipped on MacOS or Windows._\n\n`abn` has several dependencies that are not available on CRAN.\nThis is why we rely on [pak](https://pak.r-lib.org/) for the installation and the [Prior to installing](#prior-to-installing) section should be followed through before installing `abn` from CRAN. [^1]\n\n[^1]: The `abn` package includes certain features, such as multiprocessing and integration with the `INLA` package, which are limited or available only on specific CRAN flavors. \nWhile it is possible to relax the testing process by, e.g., excluding tests of these functionalities, we believe that rigorous testing is important for reliable software development, especially for a package like `abn` that includes complex functionalities. \nWe have implemented a rigorous testing framework similar to CRAN's to validate these functionalities in our development process. \nOur aim is to maximize the reliability of the `abn` package under various conditions, and we are dedicated to providing a robust and reliable package. \nWe appreciate your understanding as we work towards making `abn` available on CRAN soon.\n\n## Installing from source\n\nIt is also possible to clone this repository and install `abn` from source.\n\n\u003e [!NOTE]\n\u003e Also in this case you need to first prepare your system by following the [Prior to installing](#prior-to-installing) section.\n\nInstalling from source is done with the following steps:\n\n1. Clone the repository and go to the root directory of the repo:\n\n   ```bash\n   git clone https://github.com/furrer-lab/abn\n   cd abn\n   ```\n\n2. Deactivate `abn`'s development environment (a [renv](https://rstudio.github.io/renv/articles/renv.html) virtual environment):\n\n   ```R\n   renv::deactivate()\n   ```\n\n3. Build and install the local content with dependencies:\n\n   ```R\n   pak::repo_add(INLA = \"https://inla.r-inla-download.org/R/stable/\")\n   pak::local_install(dependencies=TRUE)\n   ```\n  _**Note:** The first command can be skipped on MacOS or Windows._\n\n# Quickstart\n\nExplore the basics of data analysis using additive Bayesian networks with the [`abn` package](https://CRAN.R-project.org/package=abn) through our [simple example](vignettes/quick_start_example.Rmd).\nThe datasets required for these examples are included within the [`abn` package](https://CRAN.R-project.org/package=abn).\n\nFor a deeper understanding, refer to the manual pages on the [`abn` homepage](https://r-bayesian-networks.org/), which include numerous examples. \nKey pages to visit are `fitAbn()`, `buildScoreCache()`, `mostProbable()`, and `searchHillClimber()`.\nAlso, see the [examples](#examples) below for a quick overview of the package's capabilities.\n\n# Features\n\nThe R package `abn` provides routines for determining optimal additive Bayesian network models for a given data set. \nThe core functionality is concerned with model selection - determining the most likely model of data from interdependent variables. \nThe model selection process can incorporate expert knowledge by specifying structural constraints, such as which arcs are banned or retained.\n\nThe general workflow with `abn` follows a three-step process:\n\n1. **Determine the model search space**: The function `buildScoreCache()` builds a cache of pre-computed scores for each possible DAG.\nFor this, it's required to specify the data types of the variables in the data set and the structural constraints of the model (e.g. which arcs are banned or retained and the maximum number of parents per node).\n\n2. **Structure learning**: `abn` offers different structure learning algorithms:\n    - The exact structure learning algorithm from [Koivisto and Sood (2004)](https://www.jmlr.org/papers/volume5/koivisto04a/koivisto04a.pdf) is implemented in `C` and can be called with the function `mostProbable()`, which finds the most probable DAG for a given data set.\n    The function `searchHeuristic()` provides a set of heuristic search algorithms. These include the hill-climber, tabu search, and simulated annealing algorithms implemented in `R`.\n    `searchHillClimber()` searches for high-scoring DAGs using a random re-start greedy hill-climber heuristic search and is implemented in `C`. It slightly deviates from the method initially presented by [Heckerman et al. 1995](https://doi.org/10.1023/A:1022623210503) (for details consult the respective help page `?abn::searchHillClimber()`).\n\n3. **Parameter estimation**: The function `fitAbn()` estimates the model's parameters based on the DAG from the previous step.\n\n`abn` allows for two different model formulations, specified with the argument `method`:\n\n- `method = \"mle\"` fits a model under the frequentist paradigm using information-theoretic criteria to select the best model.\n\n- `method = \"bayes\"` estimates the posterior distribution of the model parameters based on two Laplace approximation methods, that is, a method for Bayesian inference and an alternative to Markov Chain Monte Carlo (MCMC): A standard Laplace approximation is implemented in the `abn` source code but switches in specific cases (see help page `?fitAbn`) to the Integrated Nested Laplace Approximation from the [INLA package](https://www.r-inla.org) requiring the installation thereof.\n\nTo generate new observations from a fitted ABN model, the function `simulateAbn()` simulates data based on the DAG and the estimated parameters from the previous step. `simulateAbn()` is available for both `method = \"mle\"` and `method = \"bayes\"` and requires the installation of the [JAGS package](https://mcmc-jags.sourceforge.io). \n\n## Supported Data types\n\nThe `abn` package supports the following distributions for the variables in the network:\n\n- Gaussian distribution for continuous variables.\n\n- Binomial distribution for binary variables.\n\n- Poisson distribution for variables with count data.\n\n- Multinomial distribution for categorical variables (only available with `method = \"mle\"`).\n\nUnlike other packages, `abn` does not restrict the combination of parent-child distributions.\n\n## Multilevel Models for Grouped Data Structures\n\nThe analysis of \"hierarchical\" or \"grouped\" data, in which observations are nested within higher-level units, requires statistical models with parameters that vary across groups (e.g. mixed-effect models).\n\n`abn` allows to control for one-layer clustering, where observations are grouped into a single layer of clusters that are themself assumed to be independent, but observations within the clusters may be correlated (e.g. students nested within schools, measurements over time for each patient, etc).\nThe argument `group.var` specifies the discrete variable that defines the group structure. The model is then fitted separately for each group, and the results are combined. \n\nFor example, studying student test scores across different schools, a varying intercept model would allow for the possibility that average test scores (the intercept) might be higher in one school than another due to factors specific to each school. This can be modeled in `abn` by setting the argument `group.var` to the variable containing the school names. The model is then fitted as a varying intercept model, where the intercept is allowed to vary across schools, but the slope is assumed to be the same for all schools.\n\nUnder the frequentist paradigm (`method = \"mle\"`), `abn` relies on the `lme4` package to fit generalized linear mixed models (GLMMs) for Binomial, Poisson, and Gaussian distributed variables. For multinomial distributed variables, `abn` fits a multinomial baseline category logit model with random effects using the `mclogit` package. Currently, only one-layer clustering is supported (e.g., for `method = \"mle\"`, this corresponds to a random intercept model).\n\nWith a Bayesian approach (`method = \"bayes\"`), `abn` relies on its own implementation of the Laplace approximation and the package `INLA` to fit a single-level hierarchical model for Binomial, Poisson, and Gaussian distributed variables. Multinomial distributed variables in general (see Section [Supported Data Types](#supported-data-types)) are not yet implemented with `method = \"bayes\"`.\n\n# Basic Background\n\nBayesian network modeling is a data analysis technique ideally suited to messy, highly correlated and complex datasets. \nThis methodology is rather distinct from other forms of statistical modeling in that its focus is on structure discovery—determining an optimal graphical model that describes the interrelationships in the underlying processes that generated the data. \nIt is a **multivariate** technique and can be used for one or many dependent variables. \nThis is a data-driven approach, as opposed to relying only on subjective expert opinion to determine how variables of interest are interrelated (for example, structural equation modeling). \n\n[Below](#examples) and on the [package's website](https://r-bayesian-networks.org/), we provide some [cookbook](#examples)-type examples of how to perform Bayesian network **structure discovery** analyses with observational data. \nThe particular type of Bayesian network models considered here are **additive Bayesian networks**. \nThese are rather different, mathematically speaking, from the standard form of Bayesian network models (for binary or categorical data) presented in the academic literature, which typically use an analytically elegant but arguably interpretation-wise opaque contingency table parametrization. \nAn additive Bayesian network model is simply a **multidimensional regression model**, e.g., directly analogous to generalized linear modeling but with all variables potentially dependent. \n\nAn example can be found in the [American Journal of Epidemiology](https://doi.org/10.1093/aje/kws183), where this approach was used to investigate risk factors for child diarrhea. \nA special issue of **Preventive Veterinary Medicine** on graphical modeling features several articles that use [abn](https://CRAN.R-project.org/package=abn) to fit epidemiological data (e.g., [Ludwig et al., 2013](https://doi.org/10.1016/j.prevetmed.2013.02.005)). \nIntroductions to this methodology can be found in [Emerging Themes in Epidemiology](https://link.springer.com/journal/12982) and in [Computers in Biology and Medicine](https://doi.org/10.1016/j.compbiomed.2022.105740) where it is compared to other approaches.\n\n## What is an additive Bayesian network?\n\nAdditive Bayesian network (ABN) models are statistical models that use the principles of Bayesian statistics and graph theory. \nThey provide a framework for representing data with multiple variables, known as multivariate data.\n\nABN models are a graphical representation of (Bayesian) multivariate regression. \nThis form of statistical analysis enables the prediction of multiple outcomes from a given set of predictors while simultaneously accounting for the relationships between these outcomes.\n\nIn other words, additive Bayesian network models extend the concept of generalized linear models (GLMs), which are typically used to predict a single outcome, to scenarios with multiple dependent variables. \nThis makes them a powerful tool for understanding complex, multivariate datasets.\n\n## The term Bayesian network is interpreted differently across various fields.\n\nBayesian network models often involve binary nodes, arguably the most frequently used type of Bayesian network. \nThese models typically use a contingency table instead of an additive parameter formulation. \nThis approach allows for mathematical elegance and enables key metrics like model goodness of fit and marginal posterior parameters to be estimated analytically (i.e., from a formula) rather than numerically (an approximation). \nHowever, this parametrization may not be parsimonious, and the interpretation of the model parameters is less straightforward than the usual Generalized Linear Model (GLM) type models, which are prevalent across all scientific disciplines.\n\nWhile this is a crucial practical distinction, it’s a relatively low-level technical one, as the primary aspect of BN modeling is that it’s a form of graphical modeling – a model of the data’s joint probability distribution. \nThis joint – multidimensional – aspect makes this methodology highly attractive for complex data analysis and sets it apart from more standard regression techniques, such as GLMs, GLMMs, etc., which are only one-dimensional as they assume all covariates are independent. \nWhile this assumption is entirely reasonable in a classical experimental design scenario, it’s unrealistic for many observational studies in fields like medicine, veterinary science, ecology, and biology.\n\n# Examples\n\n- [Example 1: Basic usage](#example-1-basic-usage)\n- [Example 2: Restrict model search space](#example-2-restrict-model-search-space)\n- [Example 3: Grouped Data Structures](#example-3-grouped-data-structures)\n- [Example 4: Using INLA vs internal Laplace approximation](#example-4-using-inla-vs-internal-laplace-approximation)\n\n## Example 1: Basic Usage\n\nThis is a basic example which shows the basic workflow:\n\n``` r\nlibrary(abn)\n\n# Built-in toy dataset with two Gaussian variables G1 and G2, two Binomial variables B1 and B2, and one multinomial variable C\nstr(g2b2c_data)\n\n# Define the distributions of the variables\ndists \u003c- list(G1 = \"gaussian\",\n              B1 = \"binomial\",\n              B2 = \"binomial\",\n              C = \"multinomial\",\n              G2 = \"gaussian\")\n\n\n# Build the score cache\ncacheMLE \u003c- buildScoreCache(data.df = g2b2c_data,\n                         data.dists = dists,\n                         method = \"mle\",\n                         max.parents = 2)\n\n# Find the most probable DAG\ndagMP \u003c- mostProbable(score.cache = cacheMLE)\n\n# Print the most probable DAG\nprint(dagMP)\n\n# Plot the most probable DAG\nplot(dagMP)\n\n# Fit the most probable DAG\nmyfit \u003c- fitAbn(object = dagMP,\n                method = \"mle\")\n\n# Print the fitted DAG\nprint(myfit)\n```\n\n## Example 2: Restrict Model Search Space\n\nBased on [example 1](#example-1-basic-usage), we may know that the arc G1-\u003eG2 is not possible and that the arc from C -\u003e G2 must be present.\nThis \"expert knowledge\" can be included in the model by banning the arc from G1 to G2 and retaining the arc from C to G2.\n\nThe retain and ban matrices are specified as an adjacency matrix of 0 and 1 entries, where 1 indicates that the arc is banned or retained, respectively. \nRow and column names must match the variable names in the data set. \nThe corresponding column is a parent of the variable in the row.\nEach column represents the parents, and the row is the child. For example, the first row of the ban matrix indicates that G1 is banned as a parent of G2.\n\nFurther, we can restrict the maximum number of parents per node to 2.\n\n```r\n\n# Ban the edge G1 -\u003e G2\nbanmat \u003c- matrix(0, nrow = 5, ncol = 5, dimnames = list(names(dists), names(dists)))\nbanmat[1, 5] \u003c- 1\n\n# retain always the edge C -\u003e G2\nretainmat \u003c- matrix(0, nrow = 5, ncol = 5, dimnames = list(names(dists), names(dists)))\nretainmat[5, 4] \u003c- 1\n\n# Limit the maximum number of parents to 2\nmax.par \u003c- 2\n\n# Build the score cache\ncacheMLE_small \u003c- buildScoreCache(data.df = g2b2c_data,\n                            data.dists = dists,\n                            method = \"mle\",\n                            dag.banned = banmat,\n                            dag.retained = retainmat,\n                            max.parents = max.par)\nprint(paste(\"Without restrictions from example 1: \", nrow(cacheMLE$node.defn)))\nprint(paste(\"With restrictions as in example 2: \", nrow(cacheMLE_small$node.defn)))\n\n```\n\n## Example 3: Grouped Data Structures\n\nDepending on the data structure, we may want to control for one-layer clustering, where observations are grouped into a single layer of clusters that are themselves assumed to be independent, but observations within the clusters may be correlated (e.g., students nested within schools, measurements over time for each patient, etc.).\n\nCurrently, `abn` supports only one layer clustering. \n\n```r\n\n# Built-in toy data set\nstr(g2pbcgrp)\n\n# Define the distributions of the variables\ndists \u003c- list(G1 = \"gaussian\",\n              P = \"poisson\",\n              B = \"binomial\",\n              C = \"multinomial\",\n              G2 = \"gaussian\") # group is not among the list of variable distributions\n\n# Ban arcs such that C has only B and P as parents\nban.mat \u003c- matrix(0, nrow = 5, ncol = 5, dimnames = list(names(dists), names(dists)))\nban.mat[4, 1] \u003c- 1\nban.mat[4, 4] \u003c- 1\nban.mat[4, 5] \u003c- 1\n\n# Build the score cache\ncache \u003c- buildScoreCache(data.df = g2pbcgrp,\n                         data.dists = dists,\n                         group.var = \"group\",\n                         dag.banned = ban.mat,\n                         method = \"mle\",\n                         max.parents = 2)\n\n# Find the most probable DAG\ndag \u003c- mostProbable(score.cache = cache)\n\n# Plot the most probable DAG\nplot(dag)\n\n# Fit the most probable DAG\nfit \u003c- fitAbn(object = dag,\n              method = \"mle\")\n\n# Plot the fitted DAG\nplot(fit)\n\n# Print the fitted DAG\nprint(fit)\n\n```\n\n\n## Example 4: Using INLA vs internal Laplace approximation\n\nUnder a Bayesian approach, `abn` automatically switches to the Integrated Nested Laplace Approximation from the [INLA package](https://www.r-inla.org) if the internal Laplace approximation fails to converge. \nHowever, we can also force the use of INLA by setting the argument `control=list(max.mode.error=100)`.\n\nThe following example shows that the results are very similar. It also shows how to constrain arcs as formula objects and how to specify different parent limits for each node separately.\n\n``` r\nlibrary(abn)\n\n# Subset of the build-in dataset, see  ?ex0.dag.data\nmydat \u003c- ex0.dag.data[,c(\"b1\",\"b2\",\"g1\",\"g2\",\"b3\",\"g3\")] ## take a subset of cols\n\n# setup distribution list for each node\nmydists \u003c- list(b1=\"binomial\", b2=\"binomial\", g1=\"gaussian\",\n                g2=\"gaussian\", b3=\"binomial\", g3=\"gaussian\")\n\n# Structural constraints\n## ban arc from b2 to b1\n## always retain arc from g2 to g1\n## parent limits - can be specified for each node separately\nmax.par \u003c- list(\"b1\"=2, \"b2\"=2, \"g1\"=2, \"g2\"=2, \"b3\"=2, \"g3\"=2)\n\n# now build the cache of pre-computed scores according to the structural constraints\nres.c \u003c- buildScoreCache(data.df=mydat, data.dists=mydists,\n                         dag.banned= ~b1|b2, \n                         dag.retained= ~g1|g2, \n                         max.parents=max.par)\n\n\n# repeat but using R-INLA. The mlik's should be virtually identical.\nif(requireNamespace(\"INLA\", quietly = TRUE)){\n  res.inla \u003c- buildScoreCache(data.df=mydat, data.dists=mydists,\n                              dag.banned= ~b1|b2, # ban arc from b2 to b1\n                              dag.retained= ~g1|g2, # always retain arc from g2 to g1\n                              max.parents=max.par,\n                              control=list(max.mode.error=100)) # force using of INLA\n  \n  ## comparison - very similar\n  difference \u003c- res.c$mlik - res.inla$mlik\n  summary(difference)\n}\n```\n\n# Contributing\n\nWe greatly appreciate contributions from the community and are excited to welcome you to the development process of the `abn` package. \nHere are some guidelines to help you get started:\n\n1. **Seeking Support:** \nIf you need help with using the `abn` package, you can seek support by creating a new issue on our GitHub repository. \nPlease describe your problem in detail and include a minimal reproducible example if possible.\n\n2. **Reporting Issues or Problems:** \nIf you encounter any issues or problems with the software, please report them by creating a new issue on our GitHub repository.\nWhen reporting an issue, try to include as much detail as possible, including steps to reproduce the issue, your operating system and R version, and any error messages you received.\n\n3. **Software Contributions:**\nWe encourage contributions directly via pull requests on our GitHub repository.\nBefore starting your work, please first create an issue describing the contribution you wish to make. \nThis allows us to discuss and agree on the best way to integrate your contribution into the package.\n\nBy participating in this project, you agree to abide by our [code of conduct](#code-of-conduct). \nWe are committed to making participation in this project a respectful and harassment-free experience for everyone.\n\n# Citation\n\nIf you use `abn` in your research, please cite it as follows:\n\n``` r\n\u003e citation(\"abn\")\nTo cite the software implementation of the R package 'abn' use:\n\n  Delucchi M, Furrer R, Kratzer G, Lewis F, Liechti J, Pittavino M, Cherneva K (2024). _abn: Modelling Multivariate Data with Additive Bayesian Networks_. R package version 3.1.3, \u003chttps://CRAN.R-project.org/package=abn\u003e.\n\nTo cite the methodology of the R package 'abn' use:\n\n  Kratzer G, Lewis F, Comin A, Pittavino M, Furrer R (2023). “Additive Bayesian Network Modeling with the R Package abn.” _Journal of Statistical Software_, *105*(8), 1-41. doi:10.18637/jss.v105.i08 \u003chttps://doi.org/10.18637/jss.v105.i08\u003e.\n\nTo cite the application of mixed-effects ABN use:\n\n  Delucchi M, Liechti J, Spinner G, Furrer R (2024). “abn: Additive Bayesian Networks.” _Journal of Open Source Software_, *9*(101), 6822. R package version 3.1.3, \u003chttps://doi.org/10.21105/joss.06822\u003e.\n\nTo cite an example of a typical ABN analysis use:\n\n  Kratzer, G., Lewis, F.I., Willi, B., Meli, M.L., Boretti, F.S., Hofmann-Lehmann, R., Torgerson, P., Furrer, R. and Hartnack, S. (2020). Bayesian Network Modeling Applied to Feline Calicivirus Infection Among Cats in Switzerland. Frontiers\n  in Veterinary Science, 7, 73\n```\n\n# License\n\nThe `abn` package is licensed under the [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).\n\n# Code of Conduct\n\nPlease note that the `abn` project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.\n\n# Applications\n\nThe [abn website](https://r-bayesian-networks.org/) provides a comprehensive set of documented case studies, numerical accuracy/quality assurance exercises, and additional documentation.\n\n## Technical articles\n\n- Delucchi et al. (2024): [Additive Bayesian Networks. Journal of Open Source Software, 9(101), 6822](https://doi.org/10.21105/joss.06822)\n\n- Kratzer et al. (2023): [Additive Bayesian Network Modeling with the R Package abn](https://doi.org/10.18637/jss.v105.i08)\n\n- Kratzer et al. (2020) [Bayesian Networks modeling applied to Feline Calicivirus infection among cats in Switzerland](https://doi.org/10.3389/fvets.2020.00073)\n\n- Kratzer et al. (2018): [Comparison between Suitable Priors for Additive Bayesian Networks](https://arxiv.org/pdf/1809.06636)\n\n- Koivisto et al. (2004): [Exact Bayesian structure discovery in Bayesian networks](https://www.jmlr.org/papers/volume5/koivisto04a/koivisto04a.pdf)\n\n- Friedman et al. (2003): [Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks](https://doi.org/10.1023/A:1020249912095)\n\n- Friedman et al. (1999): [Data analysis with Bayesian networks: A bootstrap approach](https://arxiv.org/abs/1301.6695)\n\n- Heckerman et al. (1995): [Learning Bayesian Networks – The Combination of Knowledge And Statistical-Data](http://maxchickering.com/publications/ml95.pdf)\n\n## Application articles\n\n- Delucchi et al. (2022): [Bayesian network analysis reveals the interplay of intracranial aneurysm rupture risk factors](https://doi.org/10.1016/j.compbiomed.2022.105740)\n\n- Guinat et al. (2020) [Biosecurity risk factors for highly pathogenic avian influenza (H5N8) virus infection in duck farms, France](https://onlinelibrary.wiley.com/doi/10.1111/tbed.13672)\n\n- Hartnack et al. (2019) [Additive Bayesian networks for antimicrobial resistance and potential risk factors in non-typhoidal Salmonella isolates from layer hens in Uganda](https://doi.org/10.1186/s12917-019-1965-y)\n\n- Ruchti et al. (2019): [Progression and risk factors of pododermatitis in part-time group housed rabbit does in Switzerland](https://doi.org/10.1016/j.prevetmed.2019.01.013)\n\n- Comin et al. (2019) [Revealing the structure of the associations between housing system, facilities, management and welfare of commercial laying hens using Additive Bayesian Networks](https://doi.org/10.1016/j.prevetmed.2019.01.004)\n\n- Ruchti et al. (2018): [Pododermatitis in group housed rabbit does in Switzerland – prevalence, severity and risk factors](https://doi.org/10.1016/j.prevetmed.2018.06.011)\n\n- Pittavino et al. (2017): [Comparison between generalised linear modelling and additive Bayesian network; identification of factors associated with the incidence of antibodies against Leptospira interrogans sv Pomona in meat workers in New Zealand](https://doi.org/10.1016/j.actatropica.2017.04.034)\n\n- Hartnack et al. (2017): [Attitudes of Austrian veterinarians towards euthanasia in small animal practice: impacts of age and gender on views on euthanasia](https://doi.org/10.1186/s12917-016-0649-0)\n\n- Lewis et al. (2012): [Revealing the Complexity of Health Determinants in Resource-poor Settings](https://doi.org/10.1093/aje/kws183)\n\n- Lewis et al. (2011): [Structure discovery in Bayesian networks: An analytical tool for analysing complex animal health data](https://doi.org/10.1016/j.prevetmed.2011.02.003)\n\n## Workshops\n\n### Causality:\n\n- 4 December 2018, Beate Sick \u0026 Gilles Kratzer of the [1st Causality workshop](https://bsick.github.io/causality_workshop/) **talk**, Bayesian Networks meet Observational data. (UZH, Switzerland)\n\n### ABN modeling\n\n- 07 July 2021, **workshop** at the [UseR!](https://user2021.r-project.org/) Conference on [Additive Bayesian Networks Modeling](https://gilleskratzer.github.io/ABN-UseR-2021/). (Online)\n\n- 29 March 2019, **workshop** at the SVEPM conference on [Multivariate analysis using Additive Bayesian Networks](https://gilleskratzer.github.io/SVEPM2019/). (Utrecht, Netherland)\n\n## Presentations\n\n- 4 October 2018, **talk** in Nutricia (Danone). Multivariable analysis: variable and model selection in system epidemiology. (Utrecht, Netherland)\n\n- 30 May 2018. [Brown Bag Seminar](https://tensorchiefs.github.io/bbs/) in ZHAW. **Presentation**: Bayesian Networks Learning in a Nutshell. (Winterthur, Switzerland)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffurrer-lab%2Fabn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffurrer-lab%2Fabn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffurrer-lab%2Fabn/lists"}