https://github.com/paleolimbot/datafusion
Experimental R Bindings to Datafusion
https://github.com/paleolimbot/datafusion
Last synced: 5 months ago
JSON representation
Experimental R Bindings to Datafusion
- Host: GitHub
- URL: https://github.com/paleolimbot/datafusion
- Owner: paleolimbot
- License: other
- Created: 2022-11-25T15:18:46.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-03-31T17:49:03.000Z (about 2 years ago)
- Last Synced: 2024-08-13T07:15:45.785Z (8 months ago)
- Language: Rust
- Size: 36.1 KB
- Stars: 6
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
- jimsghstars - paleolimbot/datafusion - Experimental R Bindings to Datafusion (Rust)
README
---
output: github_document
---```{r, include = FALSE}
library(dplyr)
library(dbplyr)knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# datafusion
[](https://github.com/paleolimbot/datafusion/actions/workflows/R-CMD-check.yaml)
The goal of datafusion is to figure out if an R wrapper around [DataFusion](https://arrow.apache.org/datafusion/index.html) could ever be a thing.
## Installation
You can install the development version of datafusion from [GitHub](https://github.com/) with:
``` r
# install.packages("remotes")
remotes::install_github("paleolimbot/datafusion")
```This requires a Rust compiler, which will use `cargo` to build the DataFusion Rust library. This won't work on Windows (not because there's anything wrong with Rust, but because something about Rust and msys2 results in too many symbols and the linker can't deal with it).
## Example
Step one: implement Postgres-flavoured SQL generation so that we can send it to DataFusion:
```{r}
library(datafusion)
library(dplyr)
library(dbplyr)lazy_frame(a = double(), b = double(), con = simulate_datafusion(), .name = "some_table") |>
filter(b > 5) |>
summarise(x = sd(a, na.rm = TRUE)) |>
sql_render()
```Step two: build the DataFusion crate and figure out how to pass it SQL. So far I only have the mechanics to call a simple test function that returns an integer. Ideally this would be SQL in and ArrowArrayStream out!
```{r example}
library(datafusion)# Just tests a call into rust
datafusion:::testerino()
```