Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/leeper/make-example

An example of using make for a data analysis project
https://github.com/leeper/make-example

data-analysis make manuscript reproducible-research

Last synced: about 2 months ago
JSON representation

An example of using make for a data analysis project

Lists

README

        

# Example of `make` for data analysis

This repository shows how to use `make` for a data analysis project.

The paper (PDF) is generated from a LaTeX source file, a table, and a figure.

`make` uses a set of instructions in `makefile` to generate the table and figure using R code and a datafile, and then generate the PDF from the manuscript.

`make` is clever because it deconstructs each part of the analysis so that only parts that have changed need to be rerun. If the data change, everything is rerun. If figure-generating code changes, only that code and the manuscript are rerun. If only the manuscript changes, only `pdflatex` is rerun. It's smart like that.

Basically it works on a directed acyclic graph (DAG) model, represented by this network graph:

```{r setup, include=FALSE}
library("svglite")
knitr::opts_chunk$set(
dev = "svglite",
fig.path="fig-",
fig.ext = "svg"
)
```

```{r dag, echo = FALSE, fig.height=4, fig.width=6}
requireNamespace("igraph", quietly = TRUE)
requireNamespace("ggraph", quietly = TRUE)

parse_makefile <-
function(
file = "makefile",
...
) {
con <- file(file, open = "rb")
on.exit(close(con))
m <- character()
# handle continuation characters
while (length(con)) {
this_line <- readLines(con, n = 1L)
if (!length(this_line)) {
break
}
m[length(m) + 1] <- this_line
while (grepl("\\\\$", m[length(m)]) && length(con)) {
m[length(m)] <- sub("\\\\$", " ", m[length(m)])
m[length(m)] <- paste0(m[length(m)], readLines(con, n = 1L))
}
}
return(m)
}

makefile_to_network <-
function(
lines = parse_makefile(),
exclude = NULL,
...
) {
rules <- lines[grepl("[ A-Za-z0-9./]+ ?[:] ?[ A-Za-z0-9./]+", lines)]
rules_list <- strsplit(rules, " ?: ?")
edges <- lapply(rules_list, function(x) {
# cleanup target
target <- x[1L]
if (target %in% exclude) {
return(NULL)
}
## handle targets with multiple outputs
targets <- strsplit(target, " ")[[1L]]

# cleanup prereqs
prereqs <- strsplit(x[2L], "[ \t]+")[[1L]]
prereqs <- prereqs[!is.na(prereqs)]

# return
if (!length(prereqs) || all(prereqs == "")) {
return(NULL)
}
as.vector(t(as.matrix((expand.grid(prereqs, targets, stringsAsFactors = FALSE)))))
})
igraph::make_graph(unlist(edges))
}

net <- makefile_to_network(parse_makefile("makefile"), exclude = "all")
ggraph::ggraph(net, layout = "kk") +
ggraph::geom_edge_link(ggplot2::aes(start_cap = ggraph::label_rect(node1.name),
end_cap = ggraph::label_rect(node2.name)),
arrow = grid::arrow(length = grid::unit(2, 'mm'))) +
ggraph::geom_node_text(ggplot2::aes(label = name), size = 4) +
ggplot2::theme_void()
```

The R file `analysis.R` shows what is going on in `makefile` using possibly more familiar R syntax. The `README.Rmd` file contains the code to construct the above graph from an arbitrary makefile.

Zach Jones has [a good tutorial](http://zmjones.com/make/) about all of this.