{"id":13857622,"url":"https://github.com/ianmcook/tidyquery","last_synced_at":"2025-04-07T15:06:05.021Z","repository":{"id":45648543,"uuid":"205288721","full_name":"ianmcook/tidyquery","owner":"ianmcook","description":"Query R data frames with SQL","archived":false,"fork":false,"pushed_at":"2023-01-14T16:49:33.000Z","size":491,"stargazers_count":168,"open_issues_count":4,"forks_count":12,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-31T14:11:47.087Z","etag":null,"topics":["dplyr","query","r","sql","tidyverse"],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ianmcook.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-08-30T02:24:38.000Z","updated_at":"2025-02-24T13:02:28.000Z","dependencies_parsed_at":"2023-02-09T19:55:12.622Z","dependency_job_id":null,"html_url":"https://github.com/ianmcook/tidyquery","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ianmcook%2Ftidyquery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ianmcook%2Ftidyquery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ianmcook%2Ftidyquery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ianmcook%2Ftidyquery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ianmcook","download_url":"https://codeload.github.com/ianmcook/tidyquery/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247675596,"owners_count":20977376,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dplyr","query","r","sql","tidyverse"],"created_at":"2024-08-05T03:01:42.182Z","updated_at":"2025-04-07T15:06:04.981Z","avatar_url":"https://github.com/ianmcook.png","language":"R","readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n# tidyquery \u003cimg src=\"man/figures/logo.png\" align=\"right\" width=\"120\" /\u003e\n\n\u003c!-- badges: start --\u003e\n[![CRAN status](https://www.r-pkg.org/badges/version/tidyquery)](https://cran.r-project.org/package=tidyquery)\n[![GitHub Actions build status](https://github.com/ianmcook/tidyquery/actions/workflows/check-standard.yaml/badge.svg)](https://github.com/ianmcook/tidyquery/actions/workflows/check-standard.yaml)\n[![Codecov test coverage](https://codecov.io/gh/ianmcook/tidyquery/branch/master/graph/badge.svg)](https://codecov.io/gh/ianmcook/tidyquery?branch=master)\n\u003c!-- badges: end --\u003e\n\n**tidyquery** runs SQL queries on R data frames.\n\nIt uses [queryparser](https://github.com/ianmcook/queryparser) to translate SQL queries into R expressions, then it uses [dplyr](https://dplyr.tidyverse.org) to evaluate these expressions and return results. **tidyquery** does not load data frames into a database; it queries them in place.\n\nFor an introduction to **tidyquery** and **queryparser**, watch the recording of the talk [\"Bridging the Gap between SQL and R\"](https://www.youtube.com/watch?v=JwP5KdWSgqE) from rstudio::conf(2020).\n\n## Installation\n\nInstall the released version of **tidyquery** from [CRAN](https://CRAN.R-project.org/package=tidyquery) with:\n\n``` r\ninstall.packages(\"tidyquery\")\n```\n\nOr install the development version from [GitHub](https://github.com/ianmcook/tidyquery) with:\n\n``` r\n# install.packages(\"remotes\")\nremotes::install_github(\"ianmcook/tidyquery\")\n```\n\n## Usage\n\n**tidyquery** exports two functions: `query()` and `show_dplyr()`.\n\n### Using `query()`\n\nTo run a SQL query on an R data frame, call the function `query()`, passing a `SELECT` statement enclosed in quotes as the first argument. The table names in the `FROM` clause should match the names of data frames in your current R session:\n\n```{r}\nlibrary(tidyquery)\nlibrary(nycflights13)\n\nquery(\n\" SELECT origin, dest,\n    COUNT(flight) AS num_flts,\n    round(SUM(seats)) AS num_seats,\n    round(AVG(arr_delay)) AS avg_delay\n  FROM flights f LEFT OUTER JOIN planes p\n    ON f.tailnum = p.tailnum\n  WHERE distance BETWEEN 200 AND 300\n    AND air_time IS NOT NULL\n  GROUP BY origin, dest\n  HAVING num_flts \u003e 3000\n  ORDER BY num_seats DESC, avg_delay ASC\n  LIMIT 2;\"\n)\n```\n\nAlternatively, for single-table queries, you can pass a data frame as the first argument and a `SELECT` statement as the second argument, omitting the `FROM` clause. This allows `query()` to function like a dplyr verb:\n\n```{r, message=FALSE}\nlibrary(dplyr)\n\nairports %\u003e%\n  query(\"SELECT name, lat, lon ORDER BY lat DESC LIMIT 5\")\n```\n\nYou can chain dplyr verbs before and after `query()`:\n\n```{r}\nplanes %\u003e%\n  filter(engine == \"Turbo-fan\") %\u003e%\n  query(\"SELECT manufacturer AS maker, COUNT(*) AS num_planes GROUP BY maker\") %\u003e%\n  arrange(desc(num_planes)) %\u003e%\n  head(5)\n```\n\nIn the `SELECT` statement, the names of data frames and columns are case-sensitive (like in R) but keywords and function names are case-insensitive (like in SQL).\n\nIn addition to R data frames and tibbles (`tbl_df` objects), `query()` can be used to query other data frame-like objects, including:\n\n- `dtplyr_step` objects created with [dtplyr](https://dtplyr.tidyverse.org), a [data.table](https://r-datatable.com/) backend for dplyr \n- Apache Arrow `Table`, `RecordBatch`, `Dataset`, and `arrow_dplyr_query` objects created with [arrow](https://arrow.apache.org/docs/r/)\n- `tbl_sql` objects created with [dbplyr](https://dbplyr.tidyverse.org) or a dbplyr backend package, enabling you to write SQL which is translated to dplyr then translated back to SQL and run in a database (a fun party trick!)\n\n### Using `show_dplyr()`\n\n**tidyquery** works by generating dplyr code. To print the dplyr code instead of running it, use `show_dplyr()`:\n\n```{r}\nshow_dplyr(\n\" SELECT manufacturer, \n    COUNT(*) AS num_planes\n  FROM planes\n  WHERE engine = 'Turbo-fan'\n  GROUP BY manufacturer\n  ORDER BY num_planes DESC;\"\n)\n```\n\n\n## Current Limitations\n\n**tidyquery** is subject to the current limitations of the queryparser package. Please see the **Current Limitations** section of the queryparser README on [CRAN](https://cran.r-project.org/package=queryparser/readme/README.html#current-limitations) or [GitHub](https://github.com/ianmcook/queryparser#current-limitations).\n\n**tidyquery** also has the following additional limitations:\n\n- Joins involving three or more tables are not supported.\n- Because joins in dplyr currently work in a fundamentally different way than joins in SQL, some other types of join queries are not supported. Examples of unsupported join queries include non-equijoin queries and outer join queries with qualified references to the join column(s). Planned changes in dplyr will enable future versions of tidyquery to support more types of joins.\n- In the code printed by `show_dplyr()`, calls to functions with more than five arguments might be truncated, with arguments after the fifth replaced with `...`.\n\n## Related Work\n\nThe **sqldf** package ([CRAN](https://cran.r-project.org/package=sqldf), [GitHub](https://github.com/ggrothendieck/sqldf)) runs SQL queries on R data frames by transparently setting up a database, loading data from R data frames into the database, running SQL queries in the database, and returning results as R data frames.\n\nThe **duckdb** package ([CRAN](https://cran.r-project.org/package=duckdb), [GitHub](https://github.com/duckdb/duckdb/tree/master/tools/rpkg)) includes the function `duckdb_register()` which registers an R data frame as a virtual table in a [DuckDB](https://duckdb.org) database, enabling you to run SQL queries on the data frame with `DBI::dbGetQuery()`.\n\nThe **[dbplyr](https://dbplyr.tidyverse.org)** package ([CRAN](https://cran.r-project.org/package=dbplyr), [GitHub](https://github.com/tidyverse/dbplyr)) is like tidyquery in reverse: it converts dplyr code into SQL, allowing you to use dplyr to work with data in a database.\n\nIn **Python**, the\n [**dataframe_sql**](https://github.com/zbrookle/dataframe_sql)\n package (targeting [**pandas**](https://pandas.pydata.org)) and the\n [**sql_to_ibis**](https://github.com/zbrookle/sql_to_ibis) package\n (targeting [**Ibis**](https://ibis-project.org)) are analogous to\n tidyquery.\n","funding_links":[],"categories":["R"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fianmcook%2Ftidyquery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fianmcook%2Ftidyquery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fianmcook%2Ftidyquery/lists"}