An open API service indexing awesome lists of open source software.

https://github.com/sounkou-bioinfo/rtinycc

Builds 'TinyCC' 'Cli' and Library for 'C' Scripting in 'R'
https://github.com/sounkou-bioinfo/rtinycc

c ffi r rstats tinycc

Last synced: 26 days ago
JSON representation

Builds 'TinyCC' 'Cli' and Library for 'C' Scripting in 'R'

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
has_callme <- requireNamespace("callme", quietly = TRUE)
has_bench <- requireNamespace("bench", quietly = TRUE)
```

# Rtinycc

Builds `TinyCC` `Cli` and Library For `C` Scripting in `R`

[![R-CMD-check](https://github.com/sounkou-bioinfo/Rtinycc/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/sounkou-bioinfo/Rtinycc/actions/workflows/R-CMD-check.yaml)
[![CRAN status](https://www.r-pkg.org/badges/version/Rtinycc)](https://CRAN.R-project.org/package=Rtinycc)
[![Rtinycc status badge](https://sounkou-bioinfo.r-universe.dev/Rtinycc/badges/version)](https://sounkou-bioinfo.r-universe.dev/Rtinycc)

## Abstract

Rtinycc is an R interface to [TinyCC](https://github.com/TinyCC/tinycc), providing both CLI access and a libtcc-backed in-memory compiler. It includes an FFI inspired by [Bun's FFI](https://bun.com/docs/runtime/ffi) for binding C symbols with predictable type conversions and pointer utilities. The package works on unix-alikes and Windows and focuses on embedding TinyCC and enabling JIT-compiled bindings directly from R. Combined with [treesitter.c](https://github.com/sounkou-bioinfo/treesitter.c), which provides C header parsers, it can be used to rapidly generate declarative bindings.

## How it works

When you call `tcc_compile()`, Rtinycc generates C wrapper functions whose
signature follows the `.Call` convention (`SEXP` in, `SEXP` out). These wrappers
convert R types to C, call the target function, and convert the result back.
TCC compiles them in-memory -- no shared library is written to disk and no
`R_init_*` registration is needed.

After `tcc_relocate()`, wrapper pointers are retrieved via `tcc_get_symbol()`,
which internally calls `RC_libtcc_get_symbol()`. That function converts TCC's
raw `void*` into a `DL_FUNC` wrapped with `R_MakeExternalPtrFn` (tagged
`"native symbol"`). On the R side, [`make_callable()`](R/ffi.R) creates a
closure that passes this external pointer to `.Call` (aliased as `.RtinyccCall`
to keep `R CMD check` happy).

The design follows [CFFI's](https://cffi.readthedocs.io/) API-mode pattern:
instead of computing struct layouts and calling conventions in R (ABI-mode,
like Python's ctypes), the generated C code lets TCC handle `sizeof`,
`offsetof`, and argument passing. Rtinycc never replicates platform-specific
layout rules. The wrappers can also link against external shared libraries
whose symbols TCC resolves at relocation time. For background on how this
compares to a libffi approach, see the
[`RSimpleFFI` README](https://github.com/sounkou-bioinfo/RSimpleFFI#readme).

On macOS the configure script strips `-flat_namespace` from TCC's build to
avoid [BUS ERROR issues](https://genomic.social/@bioinfhotep/115765645745231377). Without it, TCC cannot resolve host symbols (e.g.
`RC_free_finalizer`) through the dynamic linker. Rtinycc works around this
with `RC_libtcc_add_host_symbols()`, which registers package-internal C
functions via `tcc_add_symbol()` before relocation. Any new C function
referenced by generated TCC code must be added there.

On Windows, the `configure.win` script generates a UCRT-backed `msvcrt.def` so
TinyCC resolves CRT symbols against `ucrtbase.dll` (R 4.2+ uses UCRT).

Ownership semantics are explicit. Pointers from `tcc_malloc()` are tagged `rtinycc_owned` and can be released with `tcc_free()` (or by their R finalizer). Generated struct constructors use a struct-specific tag
(`struct_`) with an `RC_free_finalizer`; free them with `struct__free()`, not `tcc_free()`. Pointers from `tcc_data_ptr()` are
tagged `rtinycc_borrowed` and are never freed by Rtinycc. Array returns are copied into a fresh R vector; set `free = TRUE` only when the C function returns a `malloc`-owned buffer.

## Installation

``` r
install.packages(
'Rtinycc',
repos = c('https://sounkou-bioinfo.r-universe.dev',
'https://cloud.r-project.org')
)
```

## Usage

### CLI

The CLI interface compiles C source files to standalone executables using the bundled TinyCC toolchain.

```{r cli}
library(Rtinycc)

src <- system.file("c_examples", "forty_two.c", package = "Rtinycc")
exe <- tempfile()
tcc_run_cli(c(
"-B", tcc_prefix(),
paste0("-I", tcc_include_paths()),
paste0("-L", tcc_lib_paths()),
src, "-o", exe
))
Sys.chmod(exe, mode = "0755")
system2(exe, stdout = TRUE)
```

For in-memory workflows, prefer libtcc instead.

### In-memory compilation with libtcc

We can compile and call C functions entirely in memory. This is the simplest path for quick JIT compilation.

```{r in-memory}
state <- tcc_state(output = "memory")
tcc_compile_string(state, "int forty_two(){ return 42; }")
tcc_relocate(state)
tcc_call_symbol(state, "forty_two", return = "int")
```

The lower-level API gives full control over include paths, libraries, and the R C API. Using `#define _Complex` as a workaround for TCC's lack of [complex type support](https://lists.gnu.org/archive/html/tinycc-devel/2022-04/msg00020.html), we can link against R's headers and call into `libR`.

```{r call-R-C-API}
state <- tcc_state(output = "memory")
tcc_add_include_path(state, R.home("include"))
tcc_add_library_path(state, R.home("lib"))

code <- '
#define _Complex
#include
#include

double call_r_sqrt(void) {
SEXP fn = PROTECT(Rf_findFun(Rf_install("sqrt"), R_BaseEnv));
SEXP val = PROTECT(Rf_ScalarReal(16.0));
SEXP call = PROTECT(Rf_lang2(fn, val));
SEXP out = PROTECT(Rf_eval(call, R_GlobalEnv));
double res = REAL(out)[0];
UNPROTECT(4);
return res;
}
'
tcc_compile_string(state, code)
tcc_relocate(state)
tcc_call_symbol(state, "call_r_sqrt", return = "double")
```

### Pointer utilities

Rtinycc ships a set of typed memory access functions similar to what the [ctypesio](https://cran.r-project.org/package=ctypesio) package offers, but designed around our FFI pointer model. Every scalar C type has a corresponding `tcc_read_*` / `tcc_write_*` pair that operates at a byte offset into any external pointer, so you can walk structs, arrays, and output parameters without writing C helpers.

```{r ffi-utils}
ptr <- tcc_cstring("hello")
tcc_read_cstring(ptr)
tcc_read_bytes(ptr, 5)
tcc_ptr_addr(ptr, hex = TRUE)
tcc_ptr_is_null(ptr)
tcc_free(ptr)
```

Typed reads and writes cover the full scalar range (`i8`/`u8`, `i16`/`u16`, `i32`/`u32`, `i64`/`u64`, `f32`/`f64`) plus pointer dereferencing via `tcc_read_ptr` / `tcc_write_ptr`. All operations use a byte offset and `memcpy` internally for alignment safety.

```{r ffi-typed-rw}
buf <- tcc_malloc(32)
tcc_write_i32(buf, 0L, 42L)
tcc_write_f64(buf, 8L, pi)
tcc_read_i32(buf, offset = 0L)
tcc_read_f64(buf, offset = 8L)
tcc_free(buf)
```

Pointer-to-pointer workflows are supported for C APIs that return values through output parameters.

```{r ptr-to-ptr}
ptr_ref <- tcc_malloc(.Machine$sizeof.pointer %||% 8L)
target <- tcc_malloc(8)
tcc_ptr_set(ptr_ref, target)
tcc_data_ptr(ptr_ref)
tcc_ptr_set(ptr_ref, tcc_null_ptr())
tcc_free(target)
tcc_free(ptr_ref)
```

## Declarative FFI

A declarative interface inspired by [Bun's FFI](https://bun.com/docs/runtime/ffi) sits on top of the lower-level API. We define types explicitly and Rtinycc generates the binding code, compiling it in memory with TCC.

### Type system

The FFI exposes a small set of type mappings between R and C. Conversions are explicit and predictable so callers know when data is shared versus copied.

The scalar type names are C-facing, but the R-side carriers are not all
one-to-one with those C widths:

- `i8`, `i16`, `i32`, `u8`, and `u16` are mediated through R integer scalars
- `u32`, `i64`, `u64`, `f32`, and `f64` are mediated through R numeric
(`double`) coercion and boxing
- `bool` uses R logical
- `cstring` uses an R character scalar

This means `u32` is routed through `double` to preserve the full unsigned
32-bit range, and `i64` / `u64` are only exact up to `2^53` on the R side.

Array arguments pass R vectors to C with zero copy: `raw` maps to `uint8_t*`, `integer_array` to `int32_t*`, `numeric_array` to `double*`.

Pointer types include `ptr` (opaque external pointer), `sexp` (pass a `SEXP` directly), and callback signatures like `callback:double(double)`.

Variadic functions are supported in two forms: typed prefix tails (`varargs`) and
bounded dynamic tails (`varargs_types` + `varargs_min`/`varargs_max`). Prefix
mode is the cheaper runtime path because dispatch is by tail arity only;
bounded dynamic mode adds per-call scalar type inference to select a compatible
wrapper. For hot loops, prefer fixed arity first, then prefix variadics with a
tight maximum tail size.

Array returns use `returns = list(type = "integer_array", length_arg = 2, free = TRUE)` to copy the result into a new R vector. The `length_arg` is the 1-based index of the C argument that carries the array length. Set `free = TRUE` when the C function returns a `malloc`-owned buffer.

### Simple functions

```{r ffi-simple}
ffi <- tcc_ffi() |>
tcc_source("
int add(int a, int b) { return a + b; }
") |>
tcc_bind(add = list(args = list("i32", "i32"), returns = "i32")) |>
tcc_compile()

ffi$add(5L, 3L)
```

### Variadic calls (e.g. `Rprintf` style)

Rtinycc supports two ways to bind variadic tails. The legacy approach uses
`varargs` as a typed prefix tail, while the bounded dynamic approach uses
`varargs_types` together with `varargs_min` and `varargs_max`. In the bounded
mode, wrappers are generated across the allowed arity and type combinations,
and runtime dispatch selects the matching wrapper from the scalar tail values
provided at call time.

```{r ffi-variadic-rprintf}
ffi_var <- tcc_ffi() |>
tcc_header("#include ") |>
tcc_source('
#include

int sum_fmt(int n, ...) {
va_list ap;
va_start(ap, n);
int s = 0;
for (int i = 0; i < n; i++) s += va_arg(ap, int);
va_end(ap);
Rprintf("sum_fmt(%d) = %d\\n", n, s);
return s;
}
') |>
tcc_bind(
Rprintf = list(
args = list("cstring"),
variadic = TRUE,
varargs_types = list("i32"),
varargs_min = 0L,
varargs_max = 4L,
returns = "void"
),
sum_fmt = list(
args = list("i32"),
variadic = TRUE,
varargs_types = list("i32"),
varargs_min = 0L,
varargs_max = 4L,
returns = "i32"
)
) |>
tcc_compile()

ffi_var$Rprintf("Rprintf via bind: %d + %d = %d\n", 2L, 3L, 5L)
ffi_var$sum_fmt(0L)
ffi_var$sum_fmt(2L, 10L, 20L)
ffi_var$sum_fmt(4L, 1L, 2L, 3L, 4L)
```

### Linking external libraries

We can bind directly to symbols in shared libraries. Here we link against `libm`.

```{r ffi-link}
math <- tcc_ffi() |>
tcc_library("m") |>
tcc_bind(
sqrt = list(args = list("f64"), returns = "f64"),
sin = list(args = list("f64"), returns = "f64"),
floor = list(args = list("f64"), returns = "f64")
) |>
tcc_compile()

math$sqrt(16.0)
math$sin(pi / 2)
math$floor(3.7)

```
### Compiler options

Use `tcc_options()` to pass raw TinyCC options in the high-level FFI pipeline.
For low-level states, use `tcc_set_options()` directly.

```{r ffi-options}
ffi_opt_off <- tcc_ffi() |>
tcc_options("-O0") |>
tcc_source('
int opt_macro() {
#ifdef __OPTIMIZE__
return 1;
#else
return 0;
#endif
}
') |>
tcc_bind(opt_macro = list(args = list(), returns = "i32")) |>
tcc_compile()

ffi_opt_on <- tcc_ffi() |>
tcc_options(c("-Wall", "-O2")) |>
tcc_source('
int opt_macro() {
#ifdef __OPTIMIZE__
return 1;
#else
return 0;
#endif
}
') |>
tcc_bind(opt_macro = list(args = list(), returns = "i32")) |>
tcc_compile()

ffi_opt_off$opt_macro()
ffi_opt_on$opt_macro()
```
### Working with arrays

R vectors are passed to C with zero copy. Mutations in C are visible in R.

```{r ffi-arrays}
ffi <- tcc_ffi() |>
tcc_source("
#include
#include

int64_t sum_array(int32_t* arr, int32_t n) {
int64_t s = 0;
for (int i = 0; i < n; i++) s += arr[i];
return s;
}

void bump_first(int32_t* arr) { arr[0] += 10; }

int32_t* dup_array(int32_t* arr, int32_t n) {
int32_t* out = malloc(sizeof(int32_t) * n);
memcpy(out, arr, sizeof(int32_t) * n);
return out;
}
") |>
tcc_bind(
sum_array = list(args = list("integer_array", "i32"), returns = "i64"),
bump_first = list(args = list("integer_array"), returns = "void"),
dup_array = list(
args = list("integer_array", "i32"),
returns = list(type = "integer_array", length_arg = 2, free = TRUE)
)
) |>
tcc_compile()

x <- as.integer(1:100) # to avoid ALTREP
.Internal(inspect(x))
ffi$sum_array(x, length(x))

# Zero-copy: C mutation reflects in R
ffi$bump_first(x)
x[1]

# Array return: copied into a new R vector, C buffer freed
y <- ffi$dup_array(x, length(x))
y[1]

.Internal(inspect(x))
```
## Advanced FFI features

### Structs and unions

Complex C types are supported declaratively. Use `tcc_struct()` to generate allocation and accessor helpers. Free instances when done.

```{r struct-example}
ffi <- tcc_ffi() |>
tcc_source('
#include
struct point { double x; double y; };
double distance(struct point* a, struct point* b) {
double dx = a->x - b->x, dy = a->y - b->y;
return sqrt(dx * dx + dy * dy);
}
') |>
tcc_library("m") |>
tcc_struct("point", accessors = c(x = "f64", y = "f64")) |>
tcc_bind(distance = list(args = list("ptr", "ptr"), returns = "f64")) |>
tcc_compile()

p1 <- ffi$struct_point_new()
ffi$struct_point_set_x(p1, 0.0)
ffi$struct_point_set_y(p1, 0.0)

p2 <- ffi$struct_point_new()
ffi$struct_point_set_x(p2, 3.0)
ffi$struct_point_set_y(p2, 4.0)

ffi$distance(p1, p2)

ffi$struct_point_free(p1)
ffi$struct_point_free(p2)
```

### Enums

Enums are exposed as helper functions that return integer constants.

```{r enum-example}
ffi <- tcc_ffi() |>
tcc_source("enum color { RED = 0, GREEN = 1, BLUE = 2 };") |>
tcc_enum("color", constants = c("RED", "GREEN", "BLUE")) |>
tcc_compile()

ffi$enum_color_RED()
ffi$enum_color_BLUE()
```

### Bitfields

Bitfields are handled by TCC. Accessors read and write them like normal fields.

```{r bitfield-example}
ffi <- tcc_ffi() |>
tcc_source("
struct flags {
unsigned int active : 1;
unsigned int level : 4;
};
") |>
tcc_struct("flags", accessors = c(active = "u8", level = "u8")) |>
tcc_compile()

s <- ffi$struct_flags_new()
ffi$struct_flags_set_active(s, 1L)
ffi$struct_flags_set_level(s, 9L)
ffi$struct_flags_get_active(s)
ffi$struct_flags_get_level(s)
ffi$struct_flags_free(s)
```

### Global getters and setters

C globals can be exposed with explicit getter/setter helpers.

```{r globals-example}
ffi <- tcc_ffi() |>
tcc_source("
int counter = 7;
double pi_approx = 3.14159;
") |>
tcc_global("counter", "i32") |>
tcc_global("pi_approx", "f64") |>
tcc_compile()

ffi$global_counter_get()
ffi$global_pi_approx_get()
ffi$global_counter_set(42L)
ffi$global_counter_get()
```

### Callbacks

R functions can be registered as C function pointers via `tcc_callback()` and passed to compiled code. Specify a `callback:` argument in `tcc_bind()` so the trampoline is generated automatically. Call `tcc_callback_close()` when you want deterministic invalidation and earlier release of the preserved R function.

```{r callback-example}
cb <- tcc_callback(function(x) x * x, signature = "double (*)(double)")

code <- '
double apply_fn(double (*fn)(void* ctx, double), void* ctx, double x) {
return fn(ctx, x);
}
'

ffi <- tcc_ffi() |>
tcc_source(code) |>
tcc_bind(
apply_fn = list(
args = list("callback:double(double)", "ptr", "f64"),
returns = "f64"
)
) |>
tcc_compile()

ffi$apply_fn(cb, tcc_callback_ptr(cb), 7.0)
tcc_callback_close(cb)
```

### Callback errors

If a callback throws an R error, the trampoline catches it, emits a warning, and returns a type-appropriate sentinel instead of unwinding through C. In practice this means NA-like numeric or integer values, `NA` logical, `NULL` for `cstring`, or a null external pointer depending on the declared return type.

```{r callback-error}
cb_err <- tcc_callback(
function(x) stop("boom"),
signature = "double (*)(double)"
)

ffi_err <- tcc_ffi() |>
tcc_source('
double call_cb_err(double (*cb)(void* ctx, double), void* ctx, double x) {
return cb(ctx, x);
}
') |>
tcc_bind(
call_cb_err = list(
args = list("callback:double(double)", "ptr", "f64"),
returns = "f64"
)
) |>
tcc_compile()

warned <- FALSE
res <- withCallingHandlers(
ffi_err$call_cb_err(cb_err, tcc_callback_ptr(cb_err), 1.0),
warning = function(w) {
warned <<- TRUE
invokeRestart("muffleWarning")
}
)
list(warned = warned, result = res)
tcc_callback_close(cb_err)
```

### Async callbacks

For thread-safe scheduling from worker threads, use
`callback_async:` in `tcc_bind()`. The async callback queue
is initialized automatically at package load.

When a bound function has any `callback_async:` argument, the generated
wrapper automatically runs your C function on a new thread while draining
callbacks on the main R thread. Your C code doesn't need to know about
draining at all — just call the callback as normal and the wrapper handles
the rest.

**Void return (fire-and-forget):** the callback is enqueued from any
thread and executed on the main R thread automatically — on Windows via
R's message pump, on Linux/macOS via R's event loop `addInputHandler`.

**Non-void return (synchronous):** the worker thread blocks until the
main R thread executes the callback and returns the real result.
Supported return types: integer variants (`int`, `int32_t`, `i8`, `i16`,
`u8`, `u16`), floating-point (`double`, `float`), `bool`/`logical`, and
pointer (`void*`, `T*`).

```{r callback-async}
# Fire-and-forget: void callback accumulated from 100 worker threads
hits <- 0L
cb_async <- tcc_callback(
function(x) { hits <<- hits + x; NULL },
signature = "void (*)(int)"
)

code_async <- '
struct task { void (*cb)(void* ctx, int); void* ctx; int value; };

#ifdef _WIN32
#include

static DWORD WINAPI worker(LPVOID data) {
struct task* t = (struct task*) data;
t->cb(t->ctx, t->value);
return 0;
}

int spawn_async(void (*cb)(void* ctx, int), void* ctx, int value) {
if (!cb || !ctx) return -1;
struct task t;
t.cb = cb;
t.ctx = ctx;
t.value = value;
HANDLE th = CreateThread(NULL, 0, worker, &t, 0, NULL);
if (!th) return -2;
WaitForSingleObject(th, INFINITE);
CloseHandle(th);
return 0;
}
#else
#include

static void* worker(void* data) {
struct task* t = (struct task*) data;
t->cb(t->ctx, t->value);
return NULL;
}

int spawn_async(void (*cb)(void* ctx, int), void* ctx, int value) {
if (!cb || !ctx) return -1;
const int n = 100;
struct task tasks[100];
pthread_t th[100];
for (int i = 0; i < n; i++) {
tasks[i].cb = cb;
tasks[i].ctx = ctx;
tasks[i].value = value;
if (pthread_create(&th[i], NULL, worker, &tasks[i]) != 0) {
for (int j = 0; j < i; j++) pthread_join(th[j], NULL);
return -2;
}
}
for (int i = 0; i < n; i++) pthread_join(th[i], NULL);
return 0;
}
#endif
'

ffi_async <- tcc_ffi() |>
tcc_source(code_async)
if (.Platform$OS.type != "windows") {
ffi_async <- tcc_library(ffi_async, "pthread")
}
ffi_async <- ffi_async |>
tcc_bind(
spawn_async = list(
args = list("callback_async:void(int)", "ptr", "i32"),
returns = "i32"
)
) |>
tcc_compile()

rc <- ffi_async$spawn_async(cb_async, tcc_callback_ptr(cb_async), 2L)
hits
tcc_callback_close(cb_async)
```

Non-void return works the same way — the generated wrapper handles the
drain loop transparently:

```{r callback-async-nonvoid}
cb_triple <- tcc_callback(
function(x) x * 3L,
signature = "int (*)(int)"
)

# Pure C: the worker calls the sync callback and returns its result.
# No drain logic needed — the generated wrapper handles it.
code_sync <- '
#ifdef _WIN32
#include

struct itask { int (*cb)(void*,int); void* ctx; int in; int out; };

static DWORD WINAPI iworker(LPVOID p) {
struct itask* t = (struct itask*)p;
t->out = t->cb(t->ctx, t->in);
return 0;
}
int run_worker(int (*cb)(void*,int), void* ctx, int x) {
struct itask t;
t.cb = cb; t.ctx = ctx; t.in = x; t.out = -1;
HANDLE th = CreateThread(NULL, 0, iworker, &t, 0, NULL);
if (!th) return -1;
WaitForSingleObject(th, INFINITE);
CloseHandle(th);
return t.out;
}
#else
#include

struct itask { int (*cb)(void*,int); void* ctx; int in; int out; };

static void* iworker(void* p) {
struct itask* t = (struct itask*)p;
t->out = t->cb(t->ctx, t->in);
return NULL;
}
int run_worker(int (*cb)(void*,int), void* ctx, int x) {
struct itask t;
t.cb = cb; t.ctx = ctx; t.in = x; t.out = -1;
pthread_t th;
if (pthread_create(&th, NULL, iworker, &t) != 0) return -1;
pthread_join(th, NULL);
return t.out;
}
#endif
'

ffi_sync <- tcc_ffi() |>
tcc_source(code_sync)
if (.Platform$OS.type != "windows") {
ffi_sync <- tcc_library(ffi_sync, "pthread")
}
ffi_sync <- ffi_sync |>
tcc_bind(
run_worker = list(args = list("callback_async:int(int)", "ptr", "i32"), returns = "i32")
) |>
tcc_compile()

ffi_sync$run_worker(cb_triple, tcc_callback_ptr(cb_triple), 7L) # 21
tcc_callback_close(cb_triple)
```

### SQLite: a complete example

This example ties together external library linking, callbacks, and pointer dereferencing. We open an in-memory SQLite database, execute queries, and collect rows through an R callback that reads `char**` arrays using `tcc_read_ptr` and `tcc_read_cstring`.

```{r ffi-sqlite, eval=TRUE}
ptr_size <- .Machine$sizeof.pointer

read_string_array <- function(ptr, n) {
vapply(seq_len(n), function(i) {
tcc_read_cstring(tcc_read_ptr(ptr, (i - 1L) * ptr_size))
}, "")
}

cb <- tcc_callback(
function(argc, argv, cols) {
values <- read_string_array(argv, argc)
names <- read_string_array(cols, argc)
cat(paste(names, values, sep = " = ", collapse = ", "), "\n")
0L
},
signature = "int (*)(int, char **, char **)"
)

sqlite <- tcc_ffi() |>
tcc_header("#include ") |>
tcc_library("sqlite3") |>
tcc_source('
void* open_db() {
sqlite3* db = NULL;
sqlite3_open(":memory:", &db);
return db;
}
int close_db(void* db) {
return sqlite3_close((sqlite3*)db);
}
') |>
tcc_bind(
open_db = list(args = list(), returns = "ptr"),
close_db = list(args = list("ptr"), returns = "i32"),
sqlite3_libversion = list(args = list(), returns = "cstring"),
sqlite3_exec = list(
args = list("ptr", "cstring", "callback:int(int, char **, char **)", "ptr", "ptr"),
returns = "i32"
)
) |>
tcc_compile()

sqlite$sqlite3_libversion()

db <- sqlite$open_db()
sqlite$sqlite3_exec(db, "CREATE TABLE t (id INTEGER, name TEXT);", cb, tcc_callback_ptr(cb), tcc_null_ptr())
sqlite$sqlite3_exec(db, "INSERT INTO t VALUES (1, 'hello'), (2, 'world');", cb, tcc_callback_ptr(cb), tcc_null_ptr())
sqlite$sqlite3_exec(db, "SELECT * FROM t;", cb, tcc_callback_ptr(cb), tcc_null_ptr())
sqlite$close_db(db)
tcc_callback_close(cb)
```

## Header parsing with treesitter.c

For header-driven bindings, we use `treesitter.c` to parse function signatures and generate binding specifications automatically. For struct, enum, and global helpers, `tcc_generate_bindings()` handles the code generation.

The default mapper is conservative for pointers: `char*` is treated as `ptr` because C does not guarantee NUL-terminated strings. If you know a parameter is a C string, provide a custom mapper that returns `cstring` for that type.

```{r ffi-treesitter}
header <- '
double sqrt(double x);
double sin(double x);
struct point { double x; double y; };
enum status { OK = 0, ERROR = 1 };
int global_counter;
'

tcc_treesitter_functions(header)
tcc_treesitter_structs(header)
tcc_treesitter_enums(header)
tcc_treesitter_globals(header)

# Bind parsed functions to libm
symbols <- tcc_treesitter_bindings(header)
math <- tcc_link("m", symbols = symbols)
math$sqrt(16.0)

# Generate struct/enum/global helpers
ffi <- tcc_ffi() |>
tcc_source(header) |>
tcc_generate_bindings(
header,
functions = FALSE, structs = TRUE,
enums = TRUE, globals = TRUE
) |>
tcc_compile()

ffi$struct_point_new()
ffi$enum_status_OK()
ffi$global_global_counter_get()
```

## io_uring Demo

`CSV` parser using [`io_uring`](https://en.wikipedia.org/wiki/Io_uring) on linux

```{r io_uring-demo, eval=TRUE}
if (Sys.info()[["sysname"]] == "Linux") {
c_file <- system.file("c_examples", "io_uring_csv.c", package = "Rtinycc")

n_rows <- 20000L
n_cols <- 8L
block_size <- 1024L * 1024L

set.seed(42)
tmp_csv <- tempfile("rtinycc_io_uring_readme_", fileext = ".csv")
on.exit(unlink(tmp_csv), add = TRUE)

mat <- matrix(runif(n_rows * n_cols), ncol = n_cols)
df <- as.data.frame(mat)
names(df) <- paste0("V", seq_len(n_cols))
utils::write.table(df, file = tmp_csv, sep = ",", row.names = FALSE, col.names = TRUE, quote = FALSE)
csv_size_mb <- as.double(file.info(tmp_csv)$size) / 1024^2
message(sprintf("CSV size: %.2f MB", csv_size_mb))

io_uring_src <- paste(readLines(c_file, warn = FALSE), collapse = "\n")

ffi <- tcc_ffi() |>
tcc_source(io_uring_src) |>
tcc_bind(
csv_table_read = list(
args = list("cstring", "i32", "i32"),
returns = "sexp"
),
csv_table_io_uring = list(
args = list("cstring", "i32", "i32"),
returns = "sexp"
)
) |>
tcc_compile()

baseline <- utils::read.table(tmp_csv, sep = ",", header = TRUE)
c_tbl <- ffi$csv_table_read(tmp_csv, block_size, n_cols)
uring_tbl <- ffi$csv_table_io_uring(tmp_csv, block_size, n_cols)
vroom_tbl <- vroom::vroom(
tmp_csv,
delim = ",",
altrep = FALSE,
col_types = vroom::cols(.default = "d"),
progress = FALSE,
show_col_types = FALSE
)

stopifnot(
identical(dim(c_tbl), dim(baseline)),
identical(dim(uring_tbl), dim(baseline)),
identical(dim(vroom_tbl), dim(baseline)),
isTRUE(all.equal(c_tbl, baseline, tolerance = 1e-8, check.attributes = FALSE)),
isTRUE(all.equal(uring_tbl, baseline, tolerance = 1e-8, check.attributes = FALSE)),
isTRUE(all.equal(vroom_tbl, baseline, tolerance = 1e-8, check.attributes = FALSE))
)

timings <- bench::mark(
read_table_df = {
x <- utils::read.table(tmp_csv, sep = ",", header = TRUE)
nrow(x)
},
vroom_df_altrep_false = {
x <- vroom::vroom(
tmp_csv,
delim = ",",
altrep = FALSE,
col_types = vroom::cols(.default = "d"),
progress = FALSE,
show_col_types = FALSE
)
nrow(x)
},
vroom_df_altrep_false_mat = {
x <- vroom::vroom(
tmp_csv,
delim = ",",
altrep = FALSE,
col_types = vroom::cols(.default = "d"),
progress = FALSE,
show_col_types = FALSE
)
x <- as.matrix(x)
nrow(x)
},
c_read_df = {
x <- ffi$csv_table_read(tmp_csv, block_size, n_cols)
nrow(x)
},
io_uring_df = {
x <- ffi$csv_table_io_uring(tmp_csv, block_size, n_cols)
nrow(x)
},
iterations = 2,
memory = TRUE
)


print(timings)

plot(timings, type = "boxplot") + bench::scale_x_bench_time(base = NULL)
}
```

## Known limitations

### `_Complex` types

TCC does not support C99 `_Complex` types. Generated code works around this with `#define _Complex`, which suppresses the keyword. Apply the same workaround in your own `tcc_source()` code when headers pull in complex types.

### 64-bit integer precision

R represents `i64` and `u64` values as `double`, which loses precision beyond $2^{53}$. Values that differ only past that threshold become indistinguishable.

```{r limits-int64}
sprintf("2^53: %.0f", 2^53)
sprintf("2^53 + 1: %.0f", 2^53 + 1)
identical(2^53, 2^53 + 1)
```

For exact 64-bit arithmetic, keep values in C-allocated storage and manipulate them through pointers.

### Nested structs

Named nested struct fields can now be declared explicitly with `struct:`.
Getters return borrowed nested views and setters copy bytes from another struct
object of the declared nested type.

```{r limits-nested}
ffi <- tcc_ffi() |>
tcc_source('
struct inner { int a; };
struct outer { struct inner in; };
') |>
tcc_struct("inner", accessors = c(a = "i32")) |>
tcc_struct("outer", accessors = list(`in` = "struct:inner")) |>
tcc_compile()

outer <- ffi$struct_outer_new()
inner <- ffi$struct_inner_new()
inner <- ffi$struct_inner_set_a(inner, 42L)
outer <- ffi$struct_outer_set_in(outer, inner)
inner_view <- ffi$struct_outer_get_in(outer)
ffi$struct_inner_get_a(inner_view)
ffi$struct_inner_free(inner)
ffi$struct_outer_free(outer)
```

For treesitter-generated bindings, nested struct fields inside structs still
fall back to ptr-like accessors. If you want a borrowed nested view plus a
copy-in setter, declare the nested field explicitly with `struct:`.

Bitfields are separate from ordinary addressable fields. They use scalar helper
accessors, but `field_addr()` and `container_of()` reject bitfield members.

```{r bitfield-nested-note, error=TRUE}
ffi <- tcc_ffi() |>
tcc_source('struct flags { unsigned int flag : 1; };') |>
tcc_struct(
"flags",
accessors = list(flag = list(type = "u8", bitfield = TRUE, width = 1))
)

tcc_field_addr(ffi, "flags", "flag")
tcc_container_of(ffi, "flags", "flag")
```

### Array fields in structs

Array fields require the `list(type = ..., size = N, array = TRUE)` syntax in `tcc_struct()`, which generates element-wise accessors.

```{r limits-arrays}
ffi <- tcc_ffi() |>
tcc_source('struct buf { unsigned char data[16]; };') |>
tcc_struct("buf", accessors = list(
data = list(type = "u8", size = 16, array = TRUE)
)) |>
tcc_compile()

b <- ffi$struct_buf_new()
ffi$struct_buf_set_data_elt(b, 0L, 0xCAL)
ffi$struct_buf_set_data_elt(b, 1L, 0xFEL)
ffi$struct_buf_get_data_elt(b, 0L)
ffi$struct_buf_get_data_elt(b, 1L)
ffi$struct_buf_free(b)
```

## Serialization and fork safety

Compiled FFI objects are fork-safe: `parallel::mclapply()` and other `fork()`-based parallelism work out of the box because TCC's compiled code lives in memory mappings that survive `fork()` via copy-on-write.

Serialization is also supported. Each `tcc_compiled` object stores its FFI recipe internally, so after `saveRDS()` / `readRDS()` (or `serialize()` / `unserialize()`), the first `$` access detects the dead TCC state pointer and recompiles transparently.

```{r serialize-example}
ffi <- tcc_ffi() |>
tcc_source("int square(int x) { return x * x; }") |>
tcc_bind(square = list(args = list("i32"), returns = "i32")) |>
tcc_compile()

ffi$square(7L)

tmp <- tempfile(fileext = ".rds")
saveRDS(ffi, tmp)
ffi2 <- readRDS(tmp)
unlink(tmp)

# Auto-recompiles on first access
ffi2$square(7L)
```

For explicit control, use `tcc_recompile()`. Note that raw `tcc_state` objects and bare pointers from `tcc_malloc()` do not carry a recipe and remain dead after deserialization.

## Performance and benchmarking

`Rtinycc` is optimized for fast in-process compilation and convenient FFI
workflows, not for winning every microbenchmark against a conventional
precompiled `.Call()` shared library.

In practice, the usual pattern is:

- `Rtinycc` compiles tiny modules very quickly
- a regular `.Call()` module can have lower minimal per-call overhead
- array-oriented zero-copy inputs are a much better fit than many tiny scalar
crossings
- return paths that copy native buffers back into fresh R vectors make that
copy cost visible

If you want the benchmark details rather than the high-level summary, see the
`Compilation and Call Overhead` vignette.

## License

GPL-3

## References

- [TinyCC](https://github.com/TinyCC/tinycc)
- [Bun's FFI](https://bun.com/docs/runtime/ffi)
- [CFFI](https://cffi.readthedocs.io/)
- [RSimpleFFI](https://github.com/sounkou-bioinfo/RSimpleFFI#readme)
- [CSlug](https://cslug.readthedocs.io/en/latest/)