https://github.com/itsrainingdata/ccdr

Legacy repo for "Concave penalized estimation of sparse Gaussian Bayesian networks".
https://github.com/itsrainingdata/ccdr

Last synced: about 1 year ago
JSON representation

Legacy repo for "Concave penalized estimation of sparse Gaussian Bayesian networks".

Host: GitHub
URL: https://github.com/itsrainingdata/ccdr
Owner: itsrainingdata
Created: 2015-07-14T18:15:41.000Z (almost 11 years ago)
Default Branch: master
Last Pushed: 2016-09-14T21:44:29.000Z (over 9 years ago)
Last Synced: 2025-01-29T18:11:27.438Z (over 1 year ago)
Language: C++
Homepage:
Size: 156 KB
Stars: 0
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

          ---

output:

  md_document:

    variant: markdown_github

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "README-"

)

```

# ccdr

`ccdr` implements the CCDr structure learning algorithm described in \[[1](#references)\]. Based on observational data, this algorithm estimates the structure of a Bayesian network (aka edges in a DAG) using penalized maximum likelihood based on L1 or concave (MCP) regularization.

**Important note:** This legacy package consists of a single method that implements the main algorithm and is not actively maintained. The main purpose of this repo is to serve as a reproducible snapshot for the simulations in \[[1](#references)\].

# Installation

Please note that this package is currently in a development state. In order to install the package, you will need both `devtools` and `Rcpp`. We are still working out the kinks of getting the package to compile on different systems, so please [let us know](https://github.com/itsrainingdata/ccdr/issues) if you have any issues. 

If you have never installed packages using `Rcpp` before, we recommend checking out the following resources which should get you started:

- [Rcpp for Seamless R and C++ Integration](http://www.rcpp.org)

- [Rcpp: Seamless R and C++ Integration](http://www.jstatsoft.org/v40/i08/), Dirk Eddelbuettel, Romain Francois, Journal of Statistical Software, Vol. 40, Issue 8, Apr 2011.

- [High performance functions with Rcpp](http://adv-r.had.co.nz/Rcpp.html)

The code is hosted on github and can be installed using `devtools`:

    

```R

if (packageVersion("devtools") < 1.6) {

  install.packages("devtools")

}

if(devtools::find_rtools()) devtools::install_github("itsrainingdata/ccdr")

```

**NOTE:** Windows users will need to make sure that Rtools is installed in order to build this package. To check if you have Rtools installed, you can use `devtools::find_rtools()`. For more details: [http://cran.r-project.org/bin/windows/Rtools/](http://cran.r-project.org/bin/windows/Rtools/).

If you find any bugs, please report them here on [github](https://github.com/itsrainingdata/ccdr/issues).

# Examples

The basic usage is as follows:

```R

### Specify dimensions

nn <- 20

pp <- 100

### Generate a random Gaussian matrix

dat <- matrix(rnorm(nn * pp), nrow = nn)

### Run the ccdr algorithm

ccdr.path <- ccdr.run(data = dat, lambdas.length = 20)

### Display the results

print(ccdr.path)

```

The output of `ccdr.run` is an S3 object `ccdrPath`, which is essentially a list of estimates, one for each value of lambda in the solution path. Each estimate is an S3 object `ccdrFit`. The DAG itself is stored as an edge list (see documentation for `ccdrFit-class` and `edgeList-class` for more details).

This trivial example uses uncorrelated normal data, which is not very interesting. In order to do some interesting calculations, we first need to generate data according to some pre-specified DAG structure. The `ccdr` package does not provide this functionality: To generate data from a given Bayesian network and/or simulate random networks, the following R packages are recommended:

- `bnlearn`: [bnlearn on CRAN](http://cran.r-project.org/web/packages/bnlearn/index.html), [www.bnlearn.com](http://www.bnlearn.com)

- `pcalg`: [pcalg on CRAN](http://cran.r-project.org/web/packages/pcalg/index.html)

- `igraph`: [igraph on CRAN](http://cran.r-project.org/web/packages/igraph/index.html), [http://igraph.org/r/](http://igraph.org/r/)

## Example using `pcalg`

The `pcalg` package provides two useful functions: `randomDAG` for generating a random _ordered_ DAG and `rmvDAG` for generating random data according to the structural equation model implied by the DAG.

**NOTE:** Since `randomDAG` produces an ordered DAG, we should permute the columns in the simulated dataset in order to obfuscate this information. If we know this ordering in advance, there are much better methods for estimating the DAG structure using autoregressive models.

```R

library("pcalg")

### Set up the model parameters

nn <- 20                # How many samples to draw?

pp <- 100               # How many nodes in the DAG?

num.edges <- 100        # How many *expected* edges in the DAG?

ss <- num.edges / pp    # This is the expected number of parents *per node*

### Generate a random DAG using the pcalg method randomDAG

beta.min <- 0.5

beta.max <- 2

edge.pr <- 2 * ss / (pp - 1)

g <- randomDAG(n = pp, prob = edge.pr, lB = beta.min, uB = beta.max) # Note that the edge weights are selected at random here!

### Generate random data according to this DAG using the method rmvDAG

dat <- rmvDAG(n = nn, dag = g, errDist = "normal")

dat <- dat[, sample(1:pp)] # permute the columns to randomize node ordering

### Run the algorithm

ccdr.path <- ccdr.run(data = dat, lambdas.length = 20, alpha = 10, verbose = FALSE)

```

## Example using `bnlearn`

The `bnlearn` package provides methods for reading in data from the [Bayesian Network Repository](http://www.bnlearn.com/bnrepository/). We can use these methods along with the `pcalg` function `rmvDAG` to generate random data from these structures:

```R

library("bnlearn")

library("graph")

library("pcalg")

### Download the RDA file containing bnlearn-compatible data from the repository

con <- url("http://www.bnlearn.com/bnrepository/hailfinder/hailfinder.rda")

load(con)

close(con)

### Convert the bnlearn data to a graph object

this.adj <- amat(bn)

this.graph <- as(this.adj, "graphNEL")

### Generate random data according to this DAG using the method rmvDAG

pp <- numNodes(this.graph)

nn <- 100

dat <- rmvDAG(n = nn, dag = this.graph, errDist = "normal")

dat <- dat[, sample(1:pp)] # permute the columns to randomize node ordering

### Run the algorithm

ccdr.path <- ccdr.run(data = dat, lambdas.length = 20, alpha = 10, verbose = FALSE)

```

# References

\[1\] B. Aragam and Q. Zhou. [Concave penalized estimation of sparse Gaussian Bayesian networks.](http://arxiv.org/abs/1401.0852) _The Journal of Machine Learning Research_, In press, 2015.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/itsrainingdata/ccdr

Awesome Lists containing this project

README