https://github.com/itsrainingdata/ccdr
Legacy repo for "Concave penalized estimation of sparse Gaussian Bayesian networks".
https://github.com/itsrainingdata/ccdr
Last synced: 11 months ago
JSON representation
Legacy repo for "Concave penalized estimation of sparse Gaussian Bayesian networks".
- Host: GitHub
- URL: https://github.com/itsrainingdata/ccdr
- Owner: itsrainingdata
- Created: 2015-07-14T18:15:41.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2016-09-14T21:44:29.000Z (over 9 years ago)
- Last Synced: 2025-01-29T18:11:27.438Z (about 1 year ago)
- Language: C++
- Homepage:
- Size: 156 KB
- Stars: 0
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
output:
md_document:
variant: markdown_github
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# ccdr
`ccdr` implements the CCDr structure learning algorithm described in \[[1](#references)\]. Based on observational data, this algorithm estimates the structure of a Bayesian network (aka edges in a DAG) using penalized maximum likelihood based on L1 or concave (MCP) regularization.
**Important note:** This legacy package consists of a single method that implements the main algorithm and is not actively maintained. The main purpose of this repo is to serve as a reproducible snapshot for the simulations in \[[1](#references)\].
# Installation
Please note that this package is currently in a development state. In order to install the package, you will need both `devtools` and `Rcpp`. We are still working out the kinks of getting the package to compile on different systems, so please [let us know](https://github.com/itsrainingdata/ccdr/issues) if you have any issues.
If you have never installed packages using `Rcpp` before, we recommend checking out the following resources which should get you started:
- [Rcpp for Seamless R and C++ Integration](http://www.rcpp.org)
- [Rcpp: Seamless R and C++ Integration](http://www.jstatsoft.org/v40/i08/), Dirk Eddelbuettel, Romain Francois, Journal of Statistical Software, Vol. 40, Issue 8, Apr 2011.
- [High performance functions with Rcpp](http://adv-r.had.co.nz/Rcpp.html)
The code is hosted on github and can be installed using `devtools`:
```R
if (packageVersion("devtools") < 1.6) {
install.packages("devtools")
}
if(devtools::find_rtools()) devtools::install_github("itsrainingdata/ccdr")
```
**NOTE:** Windows users will need to make sure that Rtools is installed in order to build this package. To check if you have Rtools installed, you can use `devtools::find_rtools()`. For more details: [http://cran.r-project.org/bin/windows/Rtools/](http://cran.r-project.org/bin/windows/Rtools/).
If you find any bugs, please report them here on [github](https://github.com/itsrainingdata/ccdr/issues).
# Examples
The basic usage is as follows:
```R
### Specify dimensions
nn <- 20
pp <- 100
### Generate a random Gaussian matrix
dat <- matrix(rnorm(nn * pp), nrow = nn)
### Run the ccdr algorithm
ccdr.path <- ccdr.run(data = dat, lambdas.length = 20)
### Display the results
print(ccdr.path)
```
The output of `ccdr.run` is an S3 object `ccdrPath`, which is essentially a list of estimates, one for each value of lambda in the solution path. Each estimate is an S3 object `ccdrFit`. The DAG itself is stored as an edge list (see documentation for `ccdrFit-class` and `edgeList-class` for more details).
This trivial example uses uncorrelated normal data, which is not very interesting. In order to do some interesting calculations, we first need to generate data according to some pre-specified DAG structure. The `ccdr` package does not provide this functionality: To generate data from a given Bayesian network and/or simulate random networks, the following R packages are recommended:
- `bnlearn`: [bnlearn on CRAN](http://cran.r-project.org/web/packages/bnlearn/index.html), [www.bnlearn.com](http://www.bnlearn.com)
- `pcalg`: [pcalg on CRAN](http://cran.r-project.org/web/packages/pcalg/index.html)
- `igraph`: [igraph on CRAN](http://cran.r-project.org/web/packages/igraph/index.html), [http://igraph.org/r/](http://igraph.org/r/)
## Example using `pcalg`
The `pcalg` package provides two useful functions: `randomDAG` for generating a random _ordered_ DAG and `rmvDAG` for generating random data according to the structural equation model implied by the DAG.
**NOTE:** Since `randomDAG` produces an ordered DAG, we should permute the columns in the simulated dataset in order to obfuscate this information. If we know this ordering in advance, there are much better methods for estimating the DAG structure using autoregressive models.
```R
library("pcalg")
### Set up the model parameters
nn <- 20 # How many samples to draw?
pp <- 100 # How many nodes in the DAG?
num.edges <- 100 # How many *expected* edges in the DAG?
ss <- num.edges / pp # This is the expected number of parents *per node*
### Generate a random DAG using the pcalg method randomDAG
beta.min <- 0.5
beta.max <- 2
edge.pr <- 2 * ss / (pp - 1)
g <- randomDAG(n = pp, prob = edge.pr, lB = beta.min, uB = beta.max) # Note that the edge weights are selected at random here!
### Generate random data according to this DAG using the method rmvDAG
dat <- rmvDAG(n = nn, dag = g, errDist = "normal")
dat <- dat[, sample(1:pp)] # permute the columns to randomize node ordering
### Run the algorithm
ccdr.path <- ccdr.run(data = dat, lambdas.length = 20, alpha = 10, verbose = FALSE)
```
## Example using `bnlearn`
The `bnlearn` package provides methods for reading in data from the [Bayesian Network Repository](http://www.bnlearn.com/bnrepository/). We can use these methods along with the `pcalg` function `rmvDAG` to generate random data from these structures:
```R
library("bnlearn")
library("graph")
library("pcalg")
### Download the RDA file containing bnlearn-compatible data from the repository
con <- url("http://www.bnlearn.com/bnrepository/hailfinder/hailfinder.rda")
load(con)
close(con)
### Convert the bnlearn data to a graph object
this.adj <- amat(bn)
this.graph <- as(this.adj, "graphNEL")
### Generate random data according to this DAG using the method rmvDAG
pp <- numNodes(this.graph)
nn <- 100
dat <- rmvDAG(n = nn, dag = this.graph, errDist = "normal")
dat <- dat[, sample(1:pp)] # permute the columns to randomize node ordering
### Run the algorithm
ccdr.path <- ccdr.run(data = dat, lambdas.length = 20, alpha = 10, verbose = FALSE)
```
# References
\[1\] B. Aragam and Q. Zhou. [Concave penalized estimation of sparse Gaussian Bayesian networks.](http://arxiv.org/abs/1401.0852) _The Journal of Machine Learning Research_, In press, 2015.