https://github.com/deohs/folders

Supports the use of standardized folder names within R projects.
https://github.com/deohs/folders
Last synced: 7 months ago
JSON representation
Supports the use of standardized folder names within R projects.
Host: GitHub
URL: https://github.com/deohs/folders
Owner: deohs
License: other
Created: 2020-09-21T23:49:56.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2024-03-15T21:32:23.000Z (over 1 year ago)
Last Synced: 2024-11-16T09:46:55.290Z (7 months ago)
Language: R
Size: 98.6 KB
Stars: 5
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project

jimsghstars - deohs/folders - Supports the use of standardized folder names within R projects. (R)
README

        ---

title: "folders"

output: 

  html_document: 

    keep_md: yes

---

# folders

This R package supports the use of standardized folder names in R projects. The 

idea is to provide some functions to allow you to avoid using hardcoded paths 

and `setwd()` in your R scripts. 

Instead, you can use variables like `folders$data` to refer to folder paths. 

These paths can be standardized between projects. The folders can be created 

for you under the parent folder of your R project. 

Using the defaults, or some other standardized list of folder names, 

all of your projects can have the same general folder structure. This can help 

you write cleaner, more portable, and more reproducible code.

## What does this package do?

Without this package, you could include some code like this in your scripts:

```{r, eval=FALSE}

# Create some standard folders in my project

library(here)

folders <- list(data = "data", figures = "figures", results = "results")

folders <- lapply(folders, here)

result <- lapply(folders, dir.create, showWarnings = FALSE, recursive = TRUE)

# Save some data to a file in the "data" folder

write.csv(iris, file = here(folders$data, "iris.csv"))

# Cleanup unused folders

result <- lapply(folders, function(x) {

  if (length(dir(x)) == 0) unlink(x, recursive = TRUE)

})

# Confirm that the data file still exists

file.exists(here(folders$data, "iris.csv"))

```

There is no persistence of the standard folder names, however, and if you

needed to use an alternate folder path, you would need to modify this script 

and any other that used that alternate path.

However, using *folders*, you can specify a project-wide configuration file

which will store these standard folder names. Any customization can be 

made to that file, and the other scripts in your project can use that file

with no extra changes needed to them. Collaborators can have their own custom 

folder paths without modifying the shared scripts and affecting the other 

collaborators. Plus there is less code needed in each script to make use of 

these standard or customized folder paths, as the *folders* package takes 

care of the details.

## Default Folders

The package defaults provide "code", "conf", "data", "doc", "figures" and 

"results" folders. You can specify alternatives in a YAML configuration file, 

which this package will read and use instead. See "Configuration file" below 

for more details.

You will note there is a "code" folder. If your scripts are in the "code" 

folder, your code will still be able to find the other folders, thanks 

to the *here* package.

## RStudio Projects

This package is intended to be used with RStudio Projects. 

A benefit of using RStudio Projects is, once you open the project in RStudio, 

you will be placed in the parent folder of your project (aka. "project root"). 

All of your work in the project will be relative to that location, especially if 

your project only uses files and subfolders within that parent folder. This the 

most portable way to work. Further, if you are working with a git repository, you 

will most likely want to clone this repository into an RStudio Project. 

## Other Supported Environments

This package will also work outside of RStudio Projects. For example, 

if you are working in a folder tracked by git, then the top level of the git 

repository will be identified as the "project root" folder. This behavior is 

determined by the *here* package.

If you are neither working in an RStudio project, nor in a folder tracked by a 

version control system (git or Subversion), nor an R package development 

folder, then the current working directory at the time the *here* package was 

loaded will be treated as the "project root" folder. 

Or you can force a folder to be the "project root" with a `.here` file. You 

can create one with the `here::set_here()` function. See the *here* package 

documentation for more information. However, if your goal is to write more 

reproducible code and follow best practices, you should really ask yourself why 

you are not using RStudio Projects or version control.

## Installation

You can install the stable version from *CRAN* with:

```{r, eval=FALSE}

install.packages("folders")

```

You can install the development version from *GitHub* with:

```{r, eval=FALSE}

# install.packages("devtools")

devtools::install_github("deohs/folders")

```

Or, if you prefer using *pacman*:

```{r, eval=FALSE, message=FALSE}

if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')

pacman::p_install_gh("deohs/folders")

```

## Dependencies

When you install this package, the following dependencies should be installed 

for you: [config](https://CRAN.R-project.org/package=config), 

[here](https://CRAN.R-project.org/package=here), 

[yaml](https://CRAN.R-project.org/package=yaml).

You will need to load the *here* package with your scripts to make the most use 

of the *folders* package, as seen in the Basic Usage examples below.

## Basic Usage

The following code chunk can be used at the beginning of your scripts to make 

use of standardized folders in your projects.

```{r}

# Load packages, installing as needed

if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')

pacman::p_load(here, folders)

# Get the list of standard folders and create any folders which are missing

conf_file <- here('conf', 'folders.yml')

folders <- get_folders(conf_file)

result <- create_folders(folders)

```

Then, later in your scripts, you can refer to folders like this:

```{r}

dir.exists(here(folders$data))

```

Or you can add to the standard folder paths like this: 

```{r}

file_path <- here(folders$data, "data.csv")

```

## Basic Usage Scenario

Here is an example of a script which will initialize the folders and then write 

a data file to the `folders$data` folder. Aside from setting the path to the

configuration file, there are no hardcoded paths and there is no `setwd()`.

```{r, message=FALSE}

# Load packages, installing as needed

if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')

pacman::p_load(here, folders)

# Get the list of standard folders and create any folders which are missing

conf_file <- here('conf', 'folders.yml')

folders <- get_folders(conf_file)

result <- create_folders(folders)

# Check to see that the data folder has been created

dir.exists(here(folders$data))

# Create a dataset to use for writing a CSV file to the data folder

df <- data.frame(x = letters[1:3], y = 1:3)

# Confirm that the CSV file does not yet exist

file_path <- here(folders$data, "data.csv")

file.exists(file_path)

# Write the CSV file

write.csv(df, file_path, row.names = FALSE)

# Verify that the file was written

file.exists(file_path)

# Cleanup unused (empty) folders (Optional, as you may prefer to keep them)

result <- cleanup_folders(folders)

# Verify that the data folder and CSV file still exist after cleanup

file.exists(file_path)

# Verify that the configuration file still exists after cleanup

file.exists(conf_file)

```

## Working with subfolders

You can refer to subfolders relative to the paths in your `folders` list using

`here()`. For example, if you had a folder called "raw" under your data folder, 

just refer to that folder with `here(folders$data, "raw")`:

```{r, eval=FALSE}

conf_file <- here('conf', 'folders.yml')

folders <- get_folders(conf_file)

raw_df <- here(folders$data, "raw", "file.csv")

```

If you want to create a subfolder hierarchy under all of your main folders, 

you can use `lapply()` or `purrr::map()` to create that hierarchy. For example,

we can create a "phase" folder under each folder in `folders` and then a "01" 

folder under each "phase" folder:

```{r, eval=FALSE}

conf_file <- here('conf', 'folders.yml')

folders <- lapply(get_folders(conf_file), here, "phase", "01")

res <- create_folders(folders) 

```

You can place that near the top of each of your scripts, adjusting for the 

project phase the script is used for, then you can then use `folders$data` to 

refer to a path like `data/phase/01` within the parent folder. This way, your 

scripts can always refer to the appropriate data, results, etc., folder for 

that project phase using the same variables, e.g., `folders$data`, 

`folders$results`, etc.

```{r, eval=FALSE}

df <- read.csv(here(folders$data, "data.csv"))

```

If you want to remove empty folders recursively, you can include this code at 

the end of your script:

```{r, eval=FALSE}

# Cleanup empty folders recursively

dir_lst <- sort(list.dirs(unlist(lapply(folders, here))), decreasing = TRUE)

result <- sapply(dir_lst, cleanup_folders, conf_file = conf_file)

```

Or, if you are using the latest development version of the package from GitHub,

then it's as simple as:

```{r, eval=FALSE}

# Cleanup empty folders recursively (using development version of package)

result <- cleanup_folders(folders, recursive = TRUE)

```

## Configuration file

The configuration file, if not already present, will be written by `get_folders()` 

to a YAML file with a path and filename that you provide. Usually this would be 

named something like `folders.yml`, as in the examples above, and usually you will 

want this file stored in either the parent folder or the "conf" subfolder of your 

R project. This file will be read by `config::get()` on subsequent executions of 

`get_folders()`. This behavior can be modified by function parameters.

The default configuration file looks like:

```

default:

  code: code

  conf: conf

  data: data

  doc: doc

  figures: figures

  results: results

```

Once this file has been created, you can edit it to modify the default 

folder paths. However, we advise you to stick to the defaults to maintain 

maximum consistency between your projects. If you do wish to edit it, you can

do so with any text editor or with R as shown below.

```{r, eval=FALSE}

# Load packages, installing as needed

if (!requireNamespace("pacman", quietly = TRUE)) install.packages('pacman')

pacman::p_load(here, yaml, folders)

# Get the list of standard folders, creating the configuration file if missing

conf_file <- here('conf', 'folders.yml')

folders <- get_folders(conf_file)

# Replace a default with a custom folder path

folders$data <- "data_folder"

# Edit the default configuration file to save the modification

write_yaml(list(default = folders), file = conf_file)

```

### Handling platform-dependent paths

You can add other configuration names besides "default". For example, if you

had a path that would be operating system dependent, you could edit your 

configuration file like this (abbreviated here to show only the data path):

```

default:

  data: data

Windows:

  data: //server/path/to/data

Linux:

  data: /path/to/data

Darwin:

  data: /Volumes/path/to/data

```

And then you can read in the appropriate paths for the system you are using:

```{r, eval=FALSE}

conf_file <- here('conf', 'folders.yml')

folders <- get_folders(conf_file, conf_name = Sys.info()[['sysname']])

data_folder <- folders$data

```

... so that your script can still be platform independent even though the data

path is not. If your specified `conf_name` does not exist in the configuration 

file, then the defaults are used instead. 

```{r cleanup, include=FALSE}

data_file <- here(folders$data, "data.csv")

if(file.exists(data_file)) unlink(data_file, recursive = TRUE)

result <- cleanup_folders(folders, conf_file, keep_conf = FALSE)

```

```{r build_notes, include=FALSE, eval=FALSE}

# 1. After rendering this Rmd to HTML, remove the html file. Only the md output

#    from this is needed.

# 2. Make sure you have "R_BUILD_TAR=tar" in your ~/.Renviron.

# 3. Make sure you have "tidy", "pandoc", and a LaTeX environment installed.

# 4. In Terminal, go to the parent folder of "folders" and run these commands:

#      R CMD build folders

#      R CMD check --as-cran folders_X.Y.Z.tar.gz

#    ... where X.Y.Z is the version number in DESCRIPTION.

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/deohs/folders

Awesome Lists containing this project

README