{"id":14069282,"url":"https://github.com/paleolimbot/datafusion","last_synced_at":"2025-05-10T07:30:41.788Z","repository":{"id":282025458,"uuid":"943408336","full_name":"paleolimbot/datafusion","owner":"paleolimbot","description":"Apache DataFusion SQL Query Engine","archived":false,"fork":true,"pushed_at":"2025-04-09T14:13:26.000Z","size":128723,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-04-09T15:27:21.589Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://datafusion.apache.org/","language":"Rust","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"apache/datafusion","license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/paleolimbot.png","metadata":{},"created_at":"2025-03-05T16:56:10.000Z","updated_at":"2025-04-09T14:14:11.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/paleolimbot/datafusion","commit_stats":null,"previous_names":["paleolimbot/datafusion"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paleolimbot%2Fdatafusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paleolimbot%2Fdatafusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paleolimbot%2Fdatafusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paleolimbot%2Fdatafusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/paleolimbot","download_url":"https://codeload.github.com/paleolimbot/datafusion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253379432,"owners_count":21899253,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-13T07:06:48.220Z","updated_at":"2025-05-10T07:30:41.781Z","avatar_url":"https://github.com/paleolimbot.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nlibrary(dplyr)\nlibrary(dbplyr)\n\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# datafusion\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/paleolimbot/datafusion/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/paleolimbot/datafusion/actions/workflows/R-CMD-check.yaml)\n\u003c!-- badges: end --\u003e\n\nThe goal of datafusion is to figure out if an R wrapper around [DataFusion](https://arrow.apache.org/datafusion/index.html) could ever be a thing.\n\n## Installation\n\nYou can install the development version of datafusion from [GitHub](https://github.com/) with:\n\n``` r\n# install.packages(\"remotes\")\nremotes::install_github(\"paleolimbot/datafusion\")\n```\n\nThis requires a Rust compiler, which will use `cargo` to build the DataFusion Rust library. This won't work on Windows (not because there's anything wrong with Rust, but because something about Rust and msys2 results in too many symbols and the linker can't deal with it).\n\n## Example\n\nStep one: implement Postgres-flavoured SQL generation so that we can send it to DataFusion:\n\n```{r}\nlibrary(datafusion)\nlibrary(dplyr)\nlibrary(dbplyr)\n\nlazy_frame(a = double(), b = double(), con = simulate_datafusion(), .name = \"some_table\") |\u003e \n  filter(b \u003e 5) |\u003e \n  summarise(x = sd(a, na.rm = TRUE)) |\u003e \n  sql_render()\n```\n\nStep two: build the DataFusion crate and figure out how to pass it SQL. So far I only have the mechanics to call a simple test function that returns an integer. Ideally this would be SQL in and ArrowArrayStream out!\n\n```{r example}\nlibrary(datafusion)\n\n# Just tests a call into rust\ndatafusion:::testerino()\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaleolimbot%2Fdatafusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpaleolimbot%2Fdatafusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaleolimbot%2Fdatafusion/lists"}