An open API service indexing awesome lists of open source software.

https://github.com/sounkou-bioinfo/rtinycc

Builds `tinycc` Cli and Library for R Package Use
https://github.com/sounkou-bioinfo/rtinycc

c ffi r rstats tinycc

Last synced: 5 days ago
JSON representation

Builds `tinycc` Cli and Library for R Package Use

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# Rtinycc

Builds `TinyCC` `Cli` and Library For `C` Scripting in `R`

[![R-CMD-check](https://github.com/sounkou-bioinfo/Rtinycc/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/sounkou-bioinfo/Rtinycc/actions/workflows/R-CMD-check.yaml)[![Rtinycc status badge](https://sounkou-bioinfo.r-universe.dev/Rtinycc/badges/version)](https://sounkou-bioinfo.r-universe.dev/Rtinycc)

## Abstract

Rtinycc is an R interface to [TinyCC](https://github.com/TinyCC/tinycc), providing both CLI access and a libtcc-backed in-memory compiler. It includes an experimental FFI inspired by [Bun's FFI](https://bun.com/docs/runtime/ffi) for binding C symbols with predictable type conversions and pointer utilities. The package targets Unix-alike systems and focuses on embedding TinyCC and enabling JIT-compiled bindings directly from R. Combined with [treesitter.c](https://github.com/sounkou-bioinfo/treesitter.c), which provides C header parsers, it can be used to rapidly generate declarative bindings.

## How it works

When you call `tcc_compile()`, Rtinycc generates C wrapper functions whose
signature follows the `.Call` convention (`SEXP` in, `SEXP` out). These wrappers
convert R types to C, call the target function, and convert the result back.
TCC compiles them in-memory -- no shared library is written to disk and no
`R_init_*` registration is needed.

After `tcc_relocate()`, wrapper pointers are retrieved via `tcc_get_symbol()`,
which internally calls `RC_libtcc_get_symbol()`. That function converts TCC's
raw `void*` into a `DL_FUNC` wrapped with `R_MakeExternalPtrFn` (tagged
`"native symbol"`). On the R side, [`make_callable()`](R/ffi.R) creates a
closure that passes this external pointer to `.Call` (aliased as `.RtinyccCall`
to keep `R CMD check` happy).

The design follows [CFFI's](https://cffi.readthedocs.io/) API-mode pattern:
instead of computing struct layouts and calling conventions in R (ABI-mode,
like Python's ctypes), the generated C code lets TCC handle `sizeof`,
`offsetof`, and argument passing. Rtinycc never replicates platform-specific
layout rules. The wrappers can also link against external shared libraries
whose symbols TCC resolves at relocation time. For background on how this
compares to a libffi approach, see the
[`RSimpleFFI` README](https://github.com/sounkou-bioinfo/RSimpleFFI#readme).

On macOS the configure script strips `-flat_namespace` from TCC's build to
avoid SIGEV issues. Without it, TCC cannot resolve host symbols (e.g.
`RC_free_finalizer`) through the dynamic linker. Rtinycc works around this
with `RC_libtcc_add_host_symbols()`, which registers package-internal C
functions via `tcc_add_symbol()` before relocation. Any new C function
referenced by generated TCC code must be added there.

Ownership semantics are explicit. Pointers from `tcc_malloc()` are tagged
`rtinycc_owned` and can be released with `tcc_free()` (or by their R
finalizer). Generated struct constructors use a struct-specific tag
(`struct_`) with an `RC_free_finalizer`; free them with
`struct__free()`, not `tcc_free()`. Pointers from `tcc_data_ptr()` are
tagged `rtinycc_borrowed` and are never freed by Rtinycc. Array returns are
copied into a fresh R vector; set `free = TRUE` only when the C function
returns a `malloc`-owned buffer.

## Installation

``` r
install.packages('Rtinycc', repos = c('https://sounkou-bioinfo.r-universe.dev', 'https://cloud.r-project.org'))
```

## Usage

### CLI

The CLI interface compiles C source files to standalone executables using the bundled TinyCC toolchain.

```{r cli}
library(Rtinycc)

src <- system.file("c_examples", "forty_two.c", package = "Rtinycc")
exe <- tempfile()
tcc_run_cli(c(
"-B", tcc_prefix(),
paste0("-I", tcc_include_paths()),
paste0("-L", tcc_lib_paths()),
src, "-o", exe
))
Sys.chmod(exe, mode = "0755")
system2(exe, stdout = TRUE)
```

For in-memory workflows, prefer libtcc instead.

### In-memory compilation with libtcc

We can compile and call C functions entirely in memory. This is the simplest path for quick JIT compilation.

```{r in-memory}
state <- tcc_state(output = "memory")
tcc_compile_string(state, "int forty_two(){ return 42; }")
tcc_relocate(state)
tcc_call_symbol(state, "forty_two", return = "int")
```

The lower-level API gives full control over include paths, libraries, and the R C API. Using `#define _Complex` as a workaround for TCC's lack of [complex type support](https://mail.gnu.org/archive/html/tinycc-devel/2022-04/msg00020.html), we can link against R's headers and call into `libR`.

```{r call-R-C-API}
state <- tcc_state(output = "memory")
tcc_add_include_path(state, R.home("include"))
tcc_add_library_path(state, R.home("lib"))

code <- '
#define _Complex
#include
#include

double call_r_sqrt(void) {
SEXP fn = PROTECT(Rf_findFun(Rf_install("sqrt"), R_BaseEnv));
SEXP val = PROTECT(Rf_ScalarReal(16.0));
SEXP call = PROTECT(Rf_lang2(fn, val));
SEXP out = PROTECT(Rf_eval(call, R_GlobalEnv));
double res = REAL(out)[0];
UNPROTECT(4);
return res;
}
'
tcc_compile_string(state, code)
tcc_relocate(state)
tcc_call_symbol(state, "call_r_sqrt", return = "double")
```

### Pointer utilities

Rtinycc ships a set of typed memory access functions similar to what the [ctypesio](https://cran.r-project.org/package=ctypesio) package offers, but designed around our FFI pointer model. Every scalar C type has a corresponding `tcc_read_*` / `tcc_write_*` pair that operates at a byte offset into any external pointer, so you can walk structs, arrays, and output parameters without writing C helpers.

```{r ffi-utils}
ptr <- tcc_cstring("hello")
tcc_read_cstring(ptr)
tcc_read_bytes(ptr, 5)
tcc_ptr_addr(ptr, hex = TRUE)
tcc_ptr_is_null(ptr)
tcc_free(ptr)
```

Typed reads and writes cover the full scalar range (`i8`/`u8`, `i16`/`u16`, `i32`/`u32`, `i64`/`u64`, `f32`/`f64`) plus pointer dereferencing via `tcc_read_ptr` / `tcc_write_ptr`. All operations use a byte offset and `memcpy` internally for alignment safety.

```{r ffi-typed-rw}
buf <- tcc_malloc(32)
tcc_write_i32(buf, 0L, 42L)
tcc_write_f64(buf, 8L, pi)
tcc_read_i32(buf, offset = 0L)
tcc_read_f64(buf, offset = 8L)
tcc_free(buf)
```

Pointer-to-pointer workflows are supported for C APIs that return values through output parameters.

```{r ptr-to-ptr}
ptr_ref <- tcc_malloc(.Machine$sizeof.pointer %||% 8L)
target <- tcc_malloc(8)
tcc_ptr_set(ptr_ref, target)
tcc_data_ptr(ptr_ref)
tcc_ptr_set(ptr_ref, tcc_null_ptr())
tcc_free(target)
tcc_free(ptr_ref)
```

## Declarative FFI

A declarative interface inspired by [Bun's FFI](https://bun.com/docs/runtime/ffi) sits on top of the lower-level API. We define types explicitly and Rtinycc generates the binding code, compiling it in memory with TCC.

### Type system

The FFI exposes a small set of type mappings between R and C. Conversions are explicit and predictable so callers know when data is shared versus copied.

Scalar types map one-to-one: `i8`, `i16`, `i32`, `i64` (integers); `u8`, `u16`, `u32`, `u64` (unsigned); `f32`, `f64` (floats); `bool` (logical); `cstring` (NUL-terminated string).

Array arguments pass R vectors to C with zero copy: `raw` maps to `uint8_t*`, `integer_array` to `int32_t*`, `numeric_array` to `double*`.

Pointer types include `ptr` (opaque external pointer), `sexp` (pass a `SEXP` directly), and callback signatures like `callback:double(double)`.

Array returns use `returns = list(type = "integer_array", length_arg = 2, free = TRUE)` to copy the result into a new R vector. The `length_arg` is the 1-based index of the C argument that carries the array length. Set `free = TRUE` when the C function returns a `malloc`-owned buffer.

### Simple functions

```{r ffi-simple}
ffi <- tcc_ffi() |>
tcc_source("
int add(int a, int b) { return a + b; }
") |>
tcc_bind(add = list(args = list("i32", "i32"), returns = "i32")) |>
tcc_compile()

ffi$add(5L, 3L)
```

### Linking external libraries

We can bind directly to symbols in shared libraries. Here we link against `libm`.

```{r ffi-link}
math <- tcc_ffi() |>
tcc_library("m") |>
tcc_bind(
sqrt = list(args = list("f64"), returns = "f64"),
sin = list(args = list("f64"), returns = "f64"),
floor = list(args = list("f64"), returns = "f64")
) |>
tcc_compile()

math$sqrt(16.0)
math$sin(pi / 2)
math$floor(3.7)
```

### Working with arrays

R vectors are passed to C with zero copy. Mutations in C are visible in R.

```{r ffi-arrays}
ffi <- tcc_ffi() |>
tcc_source("
#include
#include

int64_t sum_array(int32_t* arr, int32_t n) {
int64_t s = 0;
for (int i = 0; i < n; i++) s += arr[i];
return s;
}

void bump_first(int32_t* arr) { arr[0] += 10; }

int32_t* dup_array(int32_t* arr, int32_t n) {
int32_t* out = malloc(sizeof(int32_t) * n);
memcpy(out, arr, sizeof(int32_t) * n);
return out;
}
") |>
tcc_bind(
sum_array = list(args = list("integer_array", "i32"), returns = "i64"),
bump_first = list(args = list("integer_array"), returns = "void"),
dup_array = list(
args = list("integer_array", "i32"),
returns = list(type = "integer_array", length_arg = 2, free = TRUE)
)
) |>
tcc_compile()

x <- as.integer(1:100) # to avoid ALTREP
.Internal(inspect(x))
ffi$sum_array(x, length(x))

# Zero-copy: C mutation reflects in R
ffi$bump_first(x)
x[1]

# Array return: copied into a new R vector, C buffer freed
y <- ffi$dup_array(x, length(x))
y[1]

.Internal(inspect(x))
```

### Structs and unions

Complex C types are supported declaratively. Use `tcc_struct()` to generate allocation and accessor helpers. Free instances when done.

```{r struct-example}
ffi <- tcc_ffi() |>
tcc_source('
#include
struct point { double x; double y; };
double distance(struct point* a, struct point* b) {
double dx = a->x - b->x, dy = a->y - b->y;
return sqrt(dx * dx + dy * dy);
}
') |>
tcc_library("m") |>
tcc_struct("point", accessors = c(x = "f64", y = "f64")) |>
tcc_bind(distance = list(args = list("ptr", "ptr"), returns = "f64")) |>
tcc_compile()

p1 <- ffi$struct_point_new()
ffi$struct_point_set_x(p1, 0.0)
ffi$struct_point_set_y(p1, 0.0)

p2 <- ffi$struct_point_new()
ffi$struct_point_set_x(p2, 3.0)
ffi$struct_point_set_y(p2, 4.0)

ffi$distance(p1, p2)

ffi$struct_point_free(p1)
ffi$struct_point_free(p2)
```

### Enums

Enums are exposed as helper functions that return integer constants.

```{r enum-example}
ffi <- tcc_ffi() |>
tcc_source("enum color { RED = 0, GREEN = 1, BLUE = 2 };") |>
tcc_enum("color", constants = c("RED", "GREEN", "BLUE")) |>
tcc_compile()

ffi$enum_color_RED()
ffi$enum_color_BLUE()
```

### Bitfields

Bitfields are handled by TCC. Accessors read and write them like normal fields.

```{r bitfield-example}
ffi <- tcc_ffi() |>
tcc_source("
struct flags {
unsigned int active : 1;
unsigned int level : 4;
};
") |>
tcc_struct("flags", accessors = c(active = "u8", level = "u8")) |>
tcc_compile()

s <- ffi$struct_flags_new()
ffi$struct_flags_set_active(s, 1L)
ffi$struct_flags_set_level(s, 9L)
ffi$struct_flags_get_active(s)
ffi$struct_flags_get_level(s)
ffi$struct_flags_free(s)
```

### Global getters and setters

C globals can be exposed with explicit getter/setter helpers.

```{r globals-example}
ffi <- tcc_ffi() |>
tcc_source("
int counter = 7;
double pi_approx = 3.14159;
") |>
tcc_global("counter", "i32") |>
tcc_global("pi_approx", "f64") |>
tcc_compile()

ffi$global_counter_get()
ffi$global_pi_approx_get()
ffi$global_counter_set(42L)
ffi$global_counter_get()
```

### Callbacks

R functions can be registered as C function pointers via `tcc_callback()` and passed to compiled code. Specify a `callback:` argument in `tcc_bind()` so the trampoline is generated automatically. Always close callbacks when done.

```{r callback-example}
cb <- tcc_callback(function(x) x * x, signature = "double (*)(double)")

code <- '
double apply_fn(double (*fn)(void* ctx, double), void* ctx, double x) {
return fn(ctx, x);
}
'

ffi <- tcc_ffi() |>
tcc_source(code) |>
tcc_bind(
apply_fn = list(
args = list("callback:double(double)", "ptr", "f64"),
returns = "f64"
)
) |>
tcc_compile()

ffi$apply_fn(cb, tcc_callback_ptr(cb), 7.0)
tcc_callback_close(cb)
```

### Callback errors

If a callback throws an R error, the trampoline catches it, emits a warning, and returns a type-appropriate default (0 for numeric, `FALSE` for logical, `NULL` for pointer). This prevents C code from seeing an unwound stack.

```{r callback-error}
cb_err <- tcc_callback(
function(x) stop("boom"),
signature = "double (*)(double)"
)

ffi_err <- tcc_ffi() |>
tcc_source('
double call_cb_err(double (*cb)(void* ctx, double), void* ctx, double x) {
return cb(ctx, x);
}
') |>
tcc_bind(
call_cb_err = list(
args = list("callback:double(double)", "ptr", "f64"),
returns = "f64"
)
) |>
tcc_compile()

warned <- FALSE
res <- withCallingHandlers(
ffi_err$call_cb_err(cb_err, tcc_callback_ptr(cb_err), 1.0),
warning = function(w) {
warned <<- TRUE
invokeRestart("muffleWarning")
}
)
list(warned = warned, result = res)
tcc_callback_close(cb_err)
```

### Async callbacks

For thread-safe scheduling from worker threads, use `callback_async:` in `tcc_bind()`. The callback is enqueued from any thread and executed on the main R thread when you call `tcc_callback_async_drain()`. Call `tcc_callback_async_enable()` once before use.

```{r callback-async}
tcc_callback_async_enable()

hits <- 0L
cb_async <- tcc_callback(
function(x) { hits <<- hits + x; NULL },
signature = "void (*)(int)"
)

code_async <- '
#include

struct task { void (*cb)(void* ctx, int); void* ctx; int value; };

static void* worker(void* data) {
struct task* t = (struct task*) data;
t->cb(t->ctx, t->value);
return NULL;
}

int spawn_async(void (*cb)(void* ctx, int), void* ctx, int value) {
if (!cb || !ctx) return -1;
const int n = 100;
struct task tasks[100];
pthread_t th[100];
for (int i = 0; i < n; i++) {
tasks[i].cb = cb;
tasks[i].ctx = ctx;
tasks[i].value = value;
if (pthread_create(&th[i], NULL, worker, &tasks[i]) != 0) {
for (int j = 0; j < i; j++) pthread_join(th[j], NULL);
return -2;
}
}
for (int i = 0; i < n; i++) pthread_join(th[i], NULL);
return 0;
}
'

ffi_async <- tcc_ffi() |>
tcc_source(code_async) |>
tcc_library("pthread") |>
tcc_bind(
spawn_async = list(
args = list("callback_async:void(int)", "ptr", "i32"),
returns = "i32"
)
) |>
tcc_compile()

rc <- ffi_async$spawn_async(cb_async, tcc_callback_ptr(cb_async), 2L)
tcc_callback_async_drain()
hits
tcc_callback_close(cb_async)
```

### SQLite: a complete example

This example ties together external library linking, callbacks, and pointer dereferencing. We open an in-memory SQLite database, execute queries, and collect rows through an R callback that reads `char**` arrays using `tcc_read_ptr` and `tcc_read_cstring`.

```{r ffi-sqlite, eval=TRUE}
ptr_size <- .Machine$sizeof.pointer

read_string_array <- function(ptr, n) {
vapply(seq_len(n), function(i) {
tcc_read_cstring(tcc_read_ptr(ptr, (i - 1L) * ptr_size))
}, "")
}

cb <- tcc_callback(
function(argc, argv, cols) {
values <- read_string_array(argv, argc)
names <- read_string_array(cols, argc)
cat(paste(names, values, sep = " = ", collapse = ", "), "\n")
0L
},
signature = "int (*)(int, char **, char **)"
)

sqlite <- tcc_ffi() |>
tcc_header("#include ") |>
tcc_library("sqlite3") |>
tcc_source('
void* open_db() {
sqlite3* db = NULL;
sqlite3_open(":memory:", &db);
return db;
}
int close_db(void* db) {
return sqlite3_close((sqlite3*)db);
}
') |>
tcc_bind(
open_db = list(args = list(), returns = "ptr"),
close_db = list(args = list("ptr"), returns = "i32"),
sqlite3_libversion = list(args = list(), returns = "cstring"),
sqlite3_exec = list(
args = list("ptr", "cstring", "callback:int(int, char **, char **)", "ptr", "ptr"),
returns = "i32"
)
) |>
tcc_compile()

sqlite$sqlite3_libversion()

db <- sqlite$open_db()
sqlite$sqlite3_exec(db, "CREATE TABLE t (id INTEGER, name TEXT);", cb, tcc_callback_ptr(cb), tcc_null_ptr())
sqlite$sqlite3_exec(db, "INSERT INTO t VALUES (1, 'hello'), (2, 'world');", cb, tcc_callback_ptr(cb), tcc_null_ptr())
sqlite$sqlite3_exec(db, "SELECT * FROM t;", cb, tcc_callback_ptr(cb), tcc_null_ptr())
sqlite$close_db(db)
tcc_callback_close(cb)
```

## Header parsing with treesitter.c

For header-driven bindings, we use `treesitter.c` to parse function signatures and generate binding specifications automatically. For struct, enum, and global helpers, `tcc_generate_bindings()` handles the code generation.

```{r ffi-treesitter}
header <- '
double sqrt(double x);
double sin(double x);
struct point { double x; double y; };
enum status { OK = 0, ERROR = 1 };
int global_counter;
'

tcc_treesitter_functions(header)
tcc_treesitter_structs(header)
tcc_treesitter_enums(header)
tcc_treesitter_globals(header)

# Bind parsed functions to libm
symbols <- tcc_treesitter_bindings(header)
math <- tcc_link("m", symbols = symbols)
math$sqrt(16.0)

# Generate struct/enum/global helpers
ffi <- tcc_ffi() |>
tcc_source(header) |>
tcc_generate_bindings(
header,
functions = FALSE, structs = TRUE,
enums = TRUE, globals = TRUE
) |>
tcc_compile()

ffi$struct_point_new()
ffi$enum_status_OK()
ffi$global_global_counter_get()
```

## Known limitations

### `_Complex` types

TCC does not support C99 `_Complex` types. Generated code works around this with `#define _Complex`, which suppresses the keyword. Apply the same workaround in your own `tcc_source()` code when headers pull in complex types.

### 64-bit integer precision

R represents `i64` and `u64` values as `double`, which loses precision beyond $2^{53}$. Values that differ only past that threshold become indistinguishable.

```{r limits-int64}
sprintf("2^53: %.0f", 2^53)
sprintf("2^53 + 1: %.0f", 2^53 + 1)
identical(2^53, 2^53 + 1)
```

For exact 64-bit arithmetic, keep values in C-allocated storage and manipulate them through pointers.

### Nested structs

The accessor generator does not handle nested structs by value. Use pointer fields instead and reach inner structs with `tcc_field_addr()`.

```{r limits-nested}
ffi <- tcc_ffi() |>
tcc_source('
struct inner { int a; };
struct outer { struct inner* in; };
') |>
tcc_struct("inner", accessors = c(a = "i32")) |>
tcc_struct("outer", accessors = c(`in` = "ptr")) |>
tcc_field_addr("outer", "in") |>
tcc_compile()

o <- ffi$struct_outer_new()
i <- ffi$struct_inner_new()
ffi$struct_inner_set_a(i, 42L)

# Write the inner pointer into the outer struct
ffi$struct_outer_in_addr(o) |> tcc_ptr_set(i)

# Read it back through indirection
ffi$struct_outer_in_addr(o) |>
tcc_data_ptr() |>
ffi$struct_inner_get_a()

ffi$struct_inner_free(i)
ffi$struct_outer_free(o)
```

### Array fields in structs

Array fields require the `list(type = ..., size = N, array = TRUE)` syntax in `tcc_struct()`, which generates element-wise accessors.

```{r limits-arrays}
ffi <- tcc_ffi() |>
tcc_source('struct buf { unsigned char data[16]; };') |>
tcc_struct("buf", accessors = list(
data = list(type = "u8", size = 16, array = TRUE)
)) |>
tcc_compile()

b <- ffi$struct_buf_new()
ffi$struct_buf_set_data_elt(b, 0L, 0xCAL)
ffi$struct_buf_set_data_elt(b, 1L, 0xFEL)
ffi$struct_buf_get_data_elt(b, 0L)
ffi$struct_buf_get_data_elt(b, 1L)
ffi$struct_buf_free(b)
```

## Serialization and fork safety

Compiled FFI objects are fork-safe: `parallel::mclapply()` and other `fork()`-based parallelism work out of the box because TCC's compiled code lives in memory mappings that survive `fork()` via copy-on-write.

Serialization is also supported. Each `tcc_compiled` object stores its FFI recipe internally, so after `saveRDS()` / `readRDS()` (or `serialize()` / `unserialize()`), the first `$` access detects the dead TCC state pointer and recompiles transparently.

```{r serialize-example}
ffi <- tcc_ffi() |>
tcc_source("int square(int x) { return x * x; }") |>
tcc_bind(square = list(args = list("i32"), returns = "i32")) |>
tcc_compile()

ffi$square(7L)

tmp <- tempfile(fileext = ".rds")
saveRDS(ffi, tmp)
ffi2 <- readRDS(tmp)
unlink(tmp)

# Auto-recompiles on first access
ffi2$square(7L)
```

For explicit control, use `tcc_recompile()`. Note that raw `tcc_state` objects and bare pointers from `tcc_malloc()` do not carry a recipe and remain dead after deserialization.

## License

GPL-3

## References

- [TinyCC](https://github.com/TinyCC/tinycc)
- [Bun's FFI](https://bun.com/docs/runtime/ffi)
- [CFFI](https://cffi.readthedocs.io/)
- [RSimpleFFI](https://github.com/sounkou-bioinfo/RSimpleFFI#readme)
- [CSlug](https://cslug.readthedocs.io/en/latest/)