Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/deohs/folders

Supports the use of standardized folder names within R projects.
https://github.com/deohs/folders

Last synced: 8 days ago
JSON representation

Supports the use of standardized folder names within R projects.

Awesome Lists containing this project

README

        

---
title: "folders"
output:
html_document:
keep_md: yes
---

# folders

This R package supports the use of standardized folder names in R projects. The
idea is to provide some functions to allow you to avoid using hardcoded paths
and `setwd()` in your R scripts.

Instead, you can use variables like `folders$data` to refer to folder paths.
These paths can be standardized between projects. The folders can be created
for you under the parent folder of your R project.

Using the defaults, or some other standardized list of folder names,
all of your projects can have the same general folder structure. This can help
you write cleaner, more portable, and more reproducible code.

## What does this package do?

Without this package, you could include some code like this in your scripts:

```{r, eval=FALSE}
# Create some standard folders in my project
library(here)
folders <- list(data = "data", figures = "figures", results = "results")
folders <- lapply(folders, here)
result <- lapply(folders, dir.create, showWarnings = FALSE, recursive = TRUE)

# Save some data to a file in the "data" folder
write.csv(iris, file = here(folders$data, "iris.csv"))

# Cleanup unused folders
result <- lapply(folders, function(x) {
if (length(dir(x)) == 0) unlink(x, recursive = TRUE)
})

# Confirm that the data file still exists
file.exists(here(folders$data, "iris.csv"))
```

There is no persistence of the standard folder names, however, and if you
needed to use an alternate folder path, you would need to modify this script
and any other that used that alternate path.

However, using *folders*, you can specify a project-wide configuration file
which will store these standard folder names. Any customization can be
made to that file, and the other scripts in your project can use that file
with no extra changes needed to them. Collaborators can have their own custom
folder paths without modifying the shared scripts and affecting the other
collaborators. Plus there is less code needed in each script to make use of
these standard or customized folder paths, as the *folders* package takes
care of the details.

## Default Folders

The package defaults provide "code", "conf", "data", "doc", "figures" and
"results" folders. You can specify alternatives in a YAML configuration file,
which this package will read and use instead. See "Configuration file" below
for more details.

You will note there is a "code" folder. If your scripts are in the "code"
folder, your code will still be able to find the other folders, thanks
to the *here* package.

## RStudio Projects

This package is intended to be used with RStudio Projects.

A benefit of using RStudio Projects is, once you open the project in RStudio,
you will be placed in the parent folder of your project (aka. "project root").
All of your work in the project will be relative to that location, especially if
your project only uses files and subfolders within that parent folder. This the
most portable way to work. Further, if you are working with a git repository, you
will most likely want to clone this repository into an RStudio Project.

## Other Supported Environments

This package will also work outside of RStudio Projects. For example,
if you are working in a folder tracked by git, then the top level of the git
repository will be identified as the "project root" folder. This behavior is
determined by the *here* package.

If you are neither working in an RStudio project, nor in a folder tracked by a
version control system (git or Subversion), nor an R package development
folder, then the current working directory at the time the *here* package was
loaded will be treated as the "project root" folder.

Or you can force a folder to be the "project root" with a `.here` file. You
can create one with the `here::set_here()` function. See the *here* package
documentation for more information. However, if your goal is to write more
reproducible code and follow best practices, you should really ask yourself why
you are not using RStudio Projects or version control.

## Installation

You can install the stable version from *CRAN* with:

```{r, eval=FALSE}
install.packages("folders")
```

You can install the development version from *GitHub* with:

```{r, eval=FALSE}
# install.packages("devtools")
devtools::install_github("deohs/folders")
```

Or, if you prefer using *pacman*:

```{r, eval=FALSE, message=FALSE}
if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')
pacman::p_install_gh("deohs/folders")
```

## Dependencies

When you install this package, the following dependencies should be installed
for you: [config](https://CRAN.R-project.org/package=config),
[here](https://CRAN.R-project.org/package=here),
[yaml](https://CRAN.R-project.org/package=yaml).

You will need to load the *here* package with your scripts to make the most use
of the *folders* package, as seen in the Basic Usage examples below.

## Basic Usage

The following code chunk can be used at the beginning of your scripts to make
use of standardized folders in your projects.

```{r}
# Load packages, installing as needed
if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')
pacman::p_load(here, folders)

# Get the list of standard folders and create any folders which are missing
conf_file <- here('conf', 'folders.yml')
folders <- get_folders(conf_file)
result <- create_folders(folders)
```

Then, later in your scripts, you can refer to folders like this:

```{r}
dir.exists(here(folders$data))
```

Or you can add to the standard folder paths like this:

```{r}
file_path <- here(folders$data, "data.csv")
```

## Basic Usage Scenario

Here is an example of a script which will initialize the folders and then write
a data file to the `folders$data` folder. Aside from setting the path to the
configuration file, there are no hardcoded paths and there is no `setwd()`.

```{r, message=FALSE}
# Load packages, installing as needed
if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')
pacman::p_load(here, folders)

# Get the list of standard folders and create any folders which are missing
conf_file <- here('conf', 'folders.yml')
folders <- get_folders(conf_file)
result <- create_folders(folders)

# Check to see that the data folder has been created
dir.exists(here(folders$data))

# Create a dataset to use for writing a CSV file to the data folder
df <- data.frame(x = letters[1:3], y = 1:3)

# Confirm that the CSV file does not yet exist
file_path <- here(folders$data, "data.csv")
file.exists(file_path)

# Write the CSV file
write.csv(df, file_path, row.names = FALSE)

# Verify that the file was written
file.exists(file_path)

# Cleanup unused (empty) folders (Optional, as you may prefer to keep them)
result <- cleanup_folders(folders)

# Verify that the data folder and CSV file still exist after cleanup
file.exists(file_path)

# Verify that the configuration file still exists after cleanup
file.exists(conf_file)
```

## Working with subfolders

You can refer to subfolders relative to the paths in your `folders` list using
`here()`. For example, if you had a folder called "raw" under your data folder,
just refer to that folder with `here(folders$data, "raw")`:

```{r, eval=FALSE}
conf_file <- here('conf', 'folders.yml')
folders <- get_folders(conf_file)
raw_df <- here(folders$data, "raw", "file.csv")
```

If you want to create a subfolder hierarchy under all of your main folders,
you can use `lapply()` or `purrr::map()` to create that hierarchy. For example,
we can create a "phase" folder under each folder in `folders` and then a "01"
folder under each "phase" folder:

```{r, eval=FALSE}
conf_file <- here('conf', 'folders.yml')
folders <- lapply(get_folders(conf_file), here, "phase", "01")
res <- create_folders(folders)
```

You can place that near the top of each of your scripts, adjusting for the
project phase the script is used for, then you can then use `folders$data` to
refer to a path like `data/phase/01` within the parent folder. This way, your
scripts can always refer to the appropriate data, results, etc., folder for
that project phase using the same variables, e.g., `folders$data`,
`folders$results`, etc.

```{r, eval=FALSE}
df <- read.csv(here(folders$data, "data.csv"))
```

If you want to remove empty folders recursively, you can include this code at
the end of your script:

```{r, eval=FALSE}
# Cleanup empty folders recursively
dir_lst <- sort(list.dirs(unlist(lapply(folders, here))), decreasing = TRUE)
result <- sapply(dir_lst, cleanup_folders, conf_file = conf_file)
```

Or, if you are using the latest development version of the package from GitHub,
then it's as simple as:

```{r, eval=FALSE}
# Cleanup empty folders recursively (using development version of package)
result <- cleanup_folders(folders, recursive = TRUE)
```

## Configuration file

The configuration file, if not already present, will be written by `get_folders()`
to a YAML file with a path and filename that you provide. Usually this would be
named something like `folders.yml`, as in the examples above, and usually you will
want this file stored in either the parent folder or the "conf" subfolder of your
R project. This file will be read by `config::get()` on subsequent executions of
`get_folders()`. This behavior can be modified by function parameters.

The default configuration file looks like:

```
default:
code: code
conf: conf
data: data
doc: doc
figures: figures
results: results
```

Once this file has been created, you can edit it to modify the default
folder paths. However, we advise you to stick to the defaults to maintain
maximum consistency between your projects. If you do wish to edit it, you can
do so with any text editor or with R as shown below.

```{r, eval=FALSE}
# Load packages, installing as needed
if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')
pacman::p_load(here, yaml, folders)

# Get the list of standard folders, creating the configuration file if missing
conf_file <- here('conf', 'folders.yml')
folders <- get_folders(conf_file)

# Replace a default with a custom folder path
folders$data <- "data_folder"

# Edit the default configuration file to save the modification
write_yaml(list(default = folders), file = conf_file)
```

### Handling platform-dependent paths

You can add other configuration names besides "default". For example, if you
had a path that would be operating system dependent, you could edit your
configuration file like this (abbreviated here to show only the data path):

```
default:
data: data
Windows:
data: //server/path/to/data
Linux:
data: /path/to/data
Darwin:
data: /Volumes/path/to/data
```

And then you can read in the appropriate paths for the system you are using:

```{r, eval=FALSE}
conf_file <- here('conf', 'folders.yml')
folders <- get_folders(conf_file, conf_name = Sys.info()[['sysname']])
data_folder <- folders$data
```

... so that your script can still be platform independent even though the data
path is not. If your specified `conf_name` does not exist in the configuration
file, then the defaults are used instead.

```{r cleanup, include=FALSE}
data_file <- here(folders$data, "data.csv")
if(file.exists(data_file)) unlink(data_file, recursive = TRUE)
result <- cleanup_folders(folders, conf_file, keep_conf = FALSE)
```

```{r build_notes, include=FALSE, eval=FALSE}
# 1. After rendering this Rmd to HTML, remove the html file. Only the md output
# from this is needed.
# 2. Make sure you have "R_BUILD_TAR=tar" in your ~/.Renviron.
# 3. Make sure you have "tidy", "pandoc", and a LaTeX environment installed.
# 4. In Terminal, go to the parent folder of "folders" and run these commands:
# R CMD build folders
# R CMD check --as-cran folders_X.Y.Z.tar.gz
# ... where X.Y.Z is the version number in DESCRIPTION.
```