Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mpadge/typetracer

Trace Function Parameter Types in R
https://github.com/mpadge/typetracer

r

Last synced: about 2 months ago
JSON representation

Trace Function Parameter Types in R

Awesome Lists containing this project

README

        

---
title: typetracer
output: md_document
---

[![R-CMD-check](https://github.com/mpadge/typetracer/workflows/R-CMD-check/badge.svg)](https://github.com/mpadge/typetracer/actions)
[![codecov](https://codecov.io/gh/mpadge/typetracer/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mpadge/typetracer)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/typetracer)](https://cran.r-project.org/package=typetracer/)
[![CRAN Downloads](https://cranlogs.r-pkg.org/badges/grand-total/typetracer?color=orange)](https://cran.r-project.org/package=typetracer)

```{r setup, include=FALSE}
knitr::opts_chunk$set (echo = TRUE)
```

# typetracer

`typetracer` is an R package to trace function parameter types. The R language
includes [a set of defined
types](https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Basic-types),
but the language itself is ["absurdly
dynamic"](https://dl.acm.org/doi/pdf/10.1145/3340670.3342426)[^1], and lacks
any way to specify which types are expected by any expression. The `typetracer`
package enables code to be traced to extract detailed information on the
properties of parameters passed to R functions. `typetracer` can trace
individual functions or entire packages, as demonstrated below.

[^1]: Alexi Turcotte & Jan Vitek (2019), *Towards a Type System for R*,
ICOOOLPS '19: Proceedings of the 14th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems.
Article No. 4, Pages 1–5, https://doi.org/10.1145/3340670.3342426

## Installation

The stable version of the package can be installed with one of the following commands:

```{r remotes, eval = FALSE}
# Stable version from CRAN:
install.packages ("typetracer")
# Current development version from r-universe:
install.packages (
"typetracer",
repos = c ("https://mpadge.r-universe.dev", "https://cloud.r-project.org")
)
```

Alternatively, for those who prefer to use other source code platforms, the
package can also be installed by running any one of the following lines:

```{r remotes-alt, eval = FALSE}
remotes::install_git ("https://git.sr.ht/~mpadge/dodgr")
remotes::install_git ("https://codeberg.org/UrbanAnalyst/dodgr")
remotes::install_bitbucket ("UrbanAnalyst/dodgr")
remotes::install_gitlab ("UrbanAnalyst/dodgr")
```

The package can then loaded for use by calling `library`:

```{r}
library (typetracer)
```

## Example #1 - A Single Function

`typetracer` works by "injecting" tracing code into the body of a function
using [the `inject_tracer()`
function](https://mpadge.github.io/typetracer/reference/inject_tracer.html).
Locally-defined functions can be traced by simply passing the functions
directly to `inject_tracer()`. The following example includes four parameters,
including `...` to allow passing of additional and entirely arbitrary parameter
types and values.

```{r inject}
f <- function (x, y, z, ...) {
x * x + y * y
}
inject_tracer (f)
```

After injecting the `typetracer` code, calls to the function, `f`, will "trace"
each parameter of the function, by capturing both unevaluated and evaluated
representations at the point at which the function is first called. These
values can be accessed with [the `load_traces`
function](https://mpadge.github.io/typetracer/reference/load_traces.html),
which returns a `data.frame` object (in [`tibble`
format](https://tibble.tidyverse.org)) with one row for each parameter from
each function call.

```{r trace1}
val <- f (
x = 1:2,
y = 3:4 + 0.,
a = "blah",
b = list (a = 1, b = "b"),
f = a ~ b
)
x <- load_traces ()
x
```

Each row of the result returned by `load_traces()` represents one parameter
passed to one function call. Each function call itself represents a single
"trace" as enumerated by the `trace_number` column, and also uniquely
identified by an arbitrary function call hash (`fn_call_hash`). The remaining
columns of the trace data define the properties of each parameter, `p`, as:

1. `par_name`: Name of parameter.
2. `class`: List of classes of parameter.
3. `typeof`: Result of `typeof(p)`.
4. `mode`: Result of `mode(p)`.
5. `storage_mode`: Result of `storage.mode(p)`.
6. `length`: Result of `length(p)`.
7. `formal`: Result of `formals(f)[["p"]]`, as named list item with default
value where specified.
8. `uneval`: Parameters as passed to the function call prior to evaluation
within function environment.
9. `eval`: Evaluated version of parameter.

The results above show that all parameters of the function, `f()`, were
successfully traced, including the additional parameters, `a`, `b`, and `f`,
passed as part of the `...` argument. Such additional parameters can be
identified through having a `"formal"` entry of `NULL`, indicating that they
are not part of the formal arguments to the function.

That result can also be used to demonstrate the difference between the
unevaluated and evaluated forms of parameters:

```{r uneval-eval}
x$uneval [x$par_name %in% c ("b", "f")]
x$eval [x$par_name %in% c ("b", "f")]
```

Unevaluated parameters are generally converted to equivalent character
expressions.

The `typeof`, `mode`, and `storage_mode` columns are similar, yet may hold
distinct information for certain types of parameters. The conditions under
which these values differ are complex, and depend among other things on the
version of R itself. `typeof` alone should generally provide sufficient
information, although [this list of
differences](https://stackoverflow.com/a/37469255) may provide further insight
into whether the other columns may provide useful additional information.

Traces themselves are saved in the temporary directory of the current R
session, and [the `load_traces()`
function](https://mpadge.github.io/typetracer/reference/load_traces.html)
simply loads all traces created in that session. [The function
`clear_traces()`](https://mpadge.github.io/typetracer/reference/clear_traces.html)
removes all traces, so that
[`load_traces()`](https://mpadge.github.io/typetracer/reference/load_traces.html)
will only load new traces produced after that time.

### Uninjecting Traces

It is important after applying [the `inject_tracer()`
function](https://mpadge.github.io/typetracer/reference/inject_tracer.html) to
restore the functions back to their original form through calling [the obverse
`uninject_tracer()`
function](https://mpadge.github.io/typetracer/reference/uninject_tracer.html).
For the function, `r`, above, this simply requires,

```{r}
uninject_tracer (f)
```

All traces can also be removed with this functions:

```{r}
clear_traces ()
```

Because `typetracer` modifies the internal code of functions as defined within
a current R session, we strongly recommend restarting your R session after
using `typetracer`, to ensure expected function behaviour is restored.

## Example #2 - Recursion into lists

R has extensive support for list structures, notably including all
`data.frame`-like objects in which each column is actually a list item.
`typetracer` also offers the ability to recurse into the list structures of
individual parameters, to recursively trace the properties of each list item.
To do this, the traces themselves have to be injected with the additional
parameter, `trace_lists = TRUE`.

The final call above included an additional parameter passed as a list. The
following code re-injects a tracer with the ability to traverse into list
structures:

```{r}
inject_tracer (f, trace_lists = TRUE)
val <- f (
x = 1:2,
y = 3:4 + 0.,
a = "blah",
b = list (a = 1, b = "b"),
f = a ~ b
)
x_lists <- load_traces ()
print (x_lists)
```

And that result now has `r nrow(x_lists)` rows, or
`r nrow(x_lists) - nrow(x)` more than the previous example, reflecting the two
items passed as a `list` to the parameter, `b`. List-parameter items are
identifiable in typetracer output through the "dollar-notation" in the
`par_name` field. The final two values in the above table are `b$a` and `b$b`,
representing the two elements of the list passed as the parameter, `b`.

## Example #3 - Tracing a Package

This section presents a more complex example tracing all function calls from
[the `rematch` package](https://github.com/MangoTheCat/rematch), chosen because
it has less code than almost any other package on CRAN. The following single
line traces function calls in all examples for the nominated package. [The
`trace_package()`
function](https://mpadge.github.io/typetracer/reference/trace_package.html)
automatically injects tracing code into every
function within the package, so there is no need to explicitly call [the
`inject_tracer()`
function](https://mpadge.github.io/typetracer/reference/inject_tracer).

(This function also includes a `trace_lists` parameter, as demonstrated above,
with a default of `FALSE` to not recurse into tracing list structures.)

```{r trace-rematch, message = FALSE}
res <- trace_package ("rematch")
res
```

The `data.frame` returned by the `trace_package()` function includes three
more columns than the result directly returned by `load_traces()`. These
columns identify the sources and calling environments of each function call
being traces. The "call_env" column identifies the calling environment which
generated each trace, while "source_file_name" identifies the file.

```{r call_env}
unique (res$call_env)
unique (res$source_file_name)
```

Although the "call_env" columns contains no useful information for that
package, it includes information on the full environment in which each function
was called. These "environments" include such things as `tryCatch` calls
expected to generate errors, or the various `expect_` functions of the
["testthat" package](https://testthat.r-lib.org/). The above case of racing an
installed package generally only extracts traces from example code, as
documented in help, or `.Rd`, files. These are identified by the "rd_" prefix
on the "source_file_name", with the `rematch` package including only one `.Rd`
file.

[The `trace_package()`
function](https://mpadge.github.io/typetracer/reference/trace_package.html)
also includes an additional parameter, `types`, which defaults to `c
("examples", "tests")`, so that traces are also by default generated for all
tests included with local source packages (or for packages installed to include
test files). The "source" column for test files identifies the names of each
test, prefixed with "test_".

The other two additional columns of "trace_file" and "call_env" respectively
specify the source file and calling environment of each trace. These will
generally only retain information from test files, in which case the source
file will generally be the file name identified in the "source" column, and
"call_env" will specify the environment from which that function call
originated. Environments may, for example, include various types of expectation
from the ["testthat" package](https://testthat.r-lib.org). These calling
environments are useful to discern whether, for example, a call was made with
an expectation that it should error.

### Example #3(a) - Specifying Functions to Trace

[The `trace_package()`
function](https://mpadge.github.io/typetracer/reference/trace_package.html)
also accepts an argument, `functions`, specifying which functions from a
package should be traced. For example,

```{r trace-stats, eval = FALSE}
x <- trace_package ("stats", functions = "sd")
```
```{r stats-output, echo = FALSE}
# Create an empty list for formal params. "empty" means an empty name or symbol
# object, which can be conveniently constructed with 'substitute()':
formal <- pairlist (
x = substitute (),
na.rm = FALSE
)
types <- c ("integer", "logical")

x <- tibble::tibble (
trace_number = 0L,
trace_source = "examples",
fn_name = "sd",
fn_call_hash = "EzasZOKV",
trace_file = NA_character_,
call_env = NA_character_,
par_name = c ("x", "na.rm"),
class = I (as.list (types)),
typeof = types,
mode = c ("numeric", "logical"),
storage_mode = types,
length = 2:1,
formal = I (as.list (formal)),
uneval = I (list (x = 1:2, na.rm = "NULL")),
eval = I (list (x = 1:2, na.rm = FALSE)),
source = "rd_sd"
)
x
```

## Prior Art

This package extends on concepts previously developed in other R packages,
notably including:

- The [`typed` package](https://github.com/moodymudskipper/typed) by
[@moodymudskipper](https://github.com/moodymudskipper)
- The [`contractr` package](https://github.com/PRL-PRG/contractr) by
[@aviralg](https://github.com/aviralg) &
[@fikovnik](https://github.com/fikovnik)

Plus work explained in detail in this footnote: