Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/trinker/pathr


https://github.com/trinker/pathr

Last synced: about 1 month ago
JSON representation

Awesome Lists containing this project

README

        

---
title: "pathr"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
md_document:
toc: true
---

```{r, echo=FALSE}
desc <- suppressWarnings(readLines("DESCRIPTION"))
regex <- "(^Version:\\s+)(\\d+\\.\\d+\\.\\d+)"
loc <- grep(regex, desc)
ver <- gsub(regex, "\\2", desc[loc])
verbadge <- sprintf('Version', ver, ver)
````

```{r, echo=FALSE}
library(knitr)
knit_hooks$set(htmlcap = function(before, options, envir) {
if(!before) {
paste('

',options$htmlcap,"

",sep="")
}
})
library(pathr)
knitr::opts_knit$set(self.contained = TRUE, cache = FALSE)
knitr::opts_chunk$set(fig.path = "tools/figure/")
curwd <- getwd()
parsedwd <- parse_path(curwd)
```

[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/0.1.0/active.svg)](http://www.repostatus.org/#active)
[![Build Status](https://travis-ci.org/trinker/pathr.svg?branch=master)](https://travis-ci.org/trinker/pathr)
[![Coverage Status](https://coveralls.io/repos/trinker/pathr/badge.svg?branch=master)](https://coveralls.io/r/trinker/pathr?branch=master)
`r verbadge`

![](tools/pathr_logo/r_pathr.png)

**pathr** is a collection of tools to extract, examine, and reconfigure elements of file paths. The package is born out of a frustration with finding the right base **R** tools to grab certain parts of a path. Often these functions are located in packages that are base install but not loaded by default (e.g., `tools::file_ext`). Additionally, many names of path manipulation functions in base **R** are longer and thus often difficult to remember and require more time to type. Still, other path manipulation tasks had me building my own custom manipulation tools via `strsplit` and `file.path`. **pathr** is designed to be a consistent set of tools that allow the user to solve most path related needs simply by remembering 7 basic sets of parsing and manipulation tools (the first seven rows in the table of function usage found in the [Function Usage](#function-usage) section). The package is designed to be pipeable (easily used within a **magrittr**/**pipeR** pipeline) but is not required.

# Functions

Functions typically fall into the task category of (1) parsing, (2) manipulating, (3) examining, & (4) action. The main functions, task category, & descriptions are summarized in the table below:

| Function | Task | Description |
|---------------------------|------------|-----------------------------------------------------------|
| `parse_path` | parsing | Parse path into elements (sub-directories & files) |
| `front`/`back` | manipulate | Get first/last n elements of a path |
| `index` | manipulate | Get indexed elements of a path |
| `before`/`after` | manipulate | Get n elements before/after a regex occurrence |
| `swap`/`swap_index`/`swap_regex` | manipulate | Replace elements of a path |
| `file_path` | manipulate | Combine file paths |
| `expand_path` | manipulate | Expand tilde prefixed file path |
| `normalize` | manipulate | Make all path separators forward slashes |
| `win_fix` | manipulate | Replace single backslash with a forward slash |
| `file_ext`/`no_file_ext` | manipulate | Get/remove file extensions |
| `tree` | examine | View path structure as an ASCII style tree (experimental) |
| `indent_path` | examine | View path hierarchy as an indented list |
| `copy_path` | action | Copy path(s) to clipboard |
| `open_path` | action | Open path(s) (directories & file) |

# Installation

To download the development version of **pathr**:

Download the [zip ball](https://github.com/trinker/pathr/zipball/master) or [tar ball](https://github.com/trinker/pathr/tarball/master), decompress and run `R CMD INSTALL` on it, or use the **pacman** package to install the development version:

```r
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/pathr")
```

# Contact

You are welcome to:
* submit suggestions and bug-reports at:
* send a pull request on:
* compose a friendly e-mail to:

# Demonstration

## Load the Packages/Data

```{r, message=FALSE}
if (!require("pacman")) install.packages("pacman")

pacman::p_load(pathr, magrittr)

data(files)
set.seed(11); (myfiles <- sample(files, 10))
```

## Parsing

The `parse_path` function simply splits an atomic vector of paths into a list of paths split on the slash separator. For example, my current working directory, `r curwd`, becomes:

```{r}
getwd() %>%
parse_path()
```

While this isn't earth shattering it allows the **pathr** manipulation functions to extract, replace, and recombine parts of the *path* *elements* into a *sub-path*. Here I use *path* to mean the original path, `r curwd`. A path is simply a slash separated mapping of the location of a file or directory within a hierarchical order of sub-directories. These sub-directories are the *elements* of the path. The final output from one of the manipulation functions is a *sub-path* of the original at most the same number of elements as the original.

In this example I parse a multi-path vector:

```{r}
myfiles %>%
parse_path()
```

## Manipulating

Once the path has been parsed the individual elements can be extracted and/or replaced to form sub-paths. In this section I break the manipulation functions into (1) extracting (2) replacing, (3) combining, and (4) expanding types. There are a few miscellaneous **pathr** functions that are not an extracting, replacing, combining, or expanding tool which will be discussed at the end of the Manipulating section.

### Extracting

Extracting can replace path elements by their numeric index or by their content relative to a matched regular expression. There are three sets of extracting functions (1) `front`/`back`, (2) `index`, and (3) `before`/`after`. The first two rely on matching elements to their numeric position while the latter set uses extraction relative to a regular expression match.

#### Front & Back

The `front`/`back` set of functions works like `head`/`tail` (in fact these functions are used under the hood). The user can select the first `n` elements using `front` or the last `n` elements using `back`. These functions require that users want either the first or last element of the path in their sub-path.

Here I replicate base **R**'s `dirname` & `basename` functions using the default settings of `front` & `back` as `dirname` is taking `head(x, -1)` elements (or all but the last element) and `basename` is taking `tail(x, 1)` (or the last element).

```{r}
myfiles %>%
parse_path() %>%
front()

myfiles %>%
dirname()
```

```{r}
myfiles %>%
parse_path() %>%
back()

myfiles %>%
basename()
```

But the `front`/`back` set is more versatile still as demonstrated below:

```{r}
myfiles %>%
parse_path() %>%
front(3)

myfiles %>%
parse_path() %>%
back(3)
```

#### Indices

The `index` function compliments the `front`/`back` set by allowing the user to select the middle elements of a path. Unlike `front`/`back`, the `index` function does not require elements to be sequential. The user will get a sub-path equal in length to the length of the `inds` argument.

```{r}
myfiles %>%
parse_path() %>%
index(4)

myfiles %>%
parse_path() %>%
index(2:4)

myfiles %>%
parse_path() %>%
index(c(2, 4))
```

#### Before & After Regex

The `before`/`after` differ from the previous sets of manipulation functions in that it allows the user to select elements based on their content rather than numeric position. The user provides a regular expression to match against. All elements `before` or `after` this regex match will be selected for use in the sub-path. The user may include the regex matched element by setting `include = TRUE`.

Here I extract all elements after the element containing the regex `"^qdap$"`.

```{r}
myfiles %>%
parse_path() %>%
after("^qdap$")
```

The user can include the element that matched the regex as well using `include = TRUE`:

```{r}
myfiles %>%
parse_path() %>%
after("^qdap$", include = TRUE)
```

Here I use `before` to extract all elements before the regex match to paths that contain an element with a file that ends in `.R`. If a path does not contain that element match `NA` is returned.

```{r}
myfiles %>%
parse_path() %>%
before("\\.R$")
```

### Replacing

Often the user will want to replace elements of a path with another. The `swap` function allows the user to match with a numeric index or a regular expression to determine the element locations to be replaced. The `swap_index` & `swap_regex` functions are less flexible than the more inclusive function but are also more explicit, transparent and pipeable. Preference is typically given to the later `swap_xxx` functions in chained usage.

#### `swap`

In this scenario I replace the root tilde with `MyRoot`:

```{r}
myfiles %>%
parse_path() %>%
swap(1, "MyRoot")
```

#### `swap_index` & `swap_regex`

In the next use I replace `qdap` with `textMining` by referencing the third element:

```{r}
myfiles %>%
parse_path() %>%
swap_index(3, "textMining")
```

When the element position is unknown `swap_regex` provides a means to replace elements:

```{r}
myfiles %>%
parse_path() %>%
swap_regex("^qdap$", "textMining")

myfiles %>%
parse_path() %>%
swap_regex("\\.R$", "function.R")
```

### Combining

While the above tools work to produce sub-paths with an equal or less length of elements `file_path` is a means to combine/construct file paths that may be greater in length than the original path/elements supplied. `file_path` is wrapper for `base::file.path` that uses the underscore naming convention and normalizes the separator to be a single forward slash.

```{r}
file_path("root", "mydir", paste0("file", 1:2, ".pdf"))
```

This is especially useful when combined with extraction/replacement techniques to form new paths as shown below:

```{r}
myfiles %>%
parse_path() %>%
after("R$", include = TRUE) %>%
na.omit() %>%
file_path("Root", "newPackage", .)
```

### Expanding

Like `file_path` above, `expand_path` produces a path that is longer than the input path. `expand_path` is wrapper for `base::path.expand` used to expand tilde prefixed paths by replacing the leading tilde with the user's home directory.

```{r}
expand_path("~/mydir/subdir/myfile.pdf")
```

The user may have noticed that in the example [above](#extracting), demonstrating `front`'s ability to mimic `dirname`, is incomplete. That is the outputs from `front` and `dirname` are not identical. This is because `dirname`, by default, expands the tilde in the example `myfiles` whereas `front` does not. Simply adding `expand_path` on the end of the chain replicates `dirname` exactly.

```{r}
myfiles %>%
parse_path() %>%
front() %>%
expand_path()
```

### Miscellaneous

As noted above, **pathr** contains a few functions that are not an extracting, replacing, combining, or extracting tool.

#### Normalizing

`normalize` replaces all path slash separators with an **R** friendly forward slash.

```{r}
c("C:\\Users\\Tyler\\AppData\\Local\\Temp\\Rtmp2Ll9d9", "C:/R/R-3.2.2") %>%
parse_path() %>%
normalize()
```

#### Windows Fix (single backslashes)

`win_fix` reads an **R** unfriendly Windows path (single backslashes) and replaces with friendly forward slashes. This functionality can't be demonstrated within a knitr document because the single backslashes can't be parsed or copied to the clipboard from within R.

If the user were to copy the following path, `~\Packages\qdap\R\cm_code.overlap.R`, to the clipboard and run `win_fix()` the result would be: `"~/Packages/qdap/R/cm_code.overlap.R"`

## Examination

Functions of this class are designed to allow the user to examine structural aspects of a path and contents.

### Tree

`tree` allows the user to see the hierarchical structure of a path's contents (all the sub-directories and files contained within a parent directory) as a tree. This function is (with default `use.data.tree = FALSE`) OS dependent, and requires that the tree program ([tree for Windows](https://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/tree.mspx?mfr=true) or [tree for Unix](http://www.computerhope.com/unix/tree.htm)) be installed.

On my Windows system this is the tree structure for the **pathr** package as installed in **R** library:

```{r}
file_path(.libPaths(), "pathr") %>%
tree()
```

Users concerned with OS dependence can use a **data.tree** implementation of `tree`. This version is slower but is uniform and requires no outside dependencies to be installed. Additionally, the output is a **data.tree** `"Node"` class and can be manipulated accordingly.

```{r}
file_path(.libPaths(), "pathr") %>%
tree(use.data.tree = TRUE)
```

### Indented Elements

`indent_path` on the other hand, works on individual paths (not contents) to visualize the hierarchical structure of a path's elements.

```{r}
file_path(.libPaths(), "pathr/DESCRIPTION") %>%
indent_path()
```

## Action

The last type of functions in **pathr** use the operating system to do some sort of action.

### Opening

`open_path` uses the operating system defaults to open directories and files. This function is operating system and setting dependent. Results may not be consistent across operating systems. Depending upon the default programs for file types the results may vary as well. Some files may not be able to be opened.

#### Opening Directories

```{r, eval=FALSE}
open_path()

file_path(.libPaths(), "pathr") %>%
open_path()
```

#### Opening Files

```{r, eval=FALSE}
file_path(R.home(), "doc/html/about.html") %>%
open_path()
```

### Copying

`copy_path` uses the [**clipr**]() package's `write_clip` function to write the current vector, `x`, to the clipboard but still returns `x`. This makes the copying pipeable, allowing the contents to be copied yet be passed along in the chain.

```{r}
pacman::p_load(clipr)

R.home() %>%
list.files(full.names = TRUE) %>%
copy_path() %>%
parse_path() %>%
back(1)

## What was copied to the clipboard
clipr::read_clip()
```