https://github.com/sciviews/mongoplyr
Use {dplyr} and {dbplyr} with MongoDB
https://github.com/sciviews/mongoplyr
dplyr-sql-backends mongodb r r-package
Last synced: about 2 months ago
JSON representation
Use {dplyr} and {dbplyr} with MongoDB
- Host: GitHub
- URL: https://github.com/sciviews/mongoplyr
- Owner: SciViews
- License: other
- Created: 2023-07-05T09:32:17.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-07-06T17:21:54.000Z (almost 2 years ago)
- Last Synced: 2025-01-24T20:37:33.938Z (3 months ago)
- Topics: dplyr-sql-backends, mongodb, r, r-package
- Language: R
- Homepage: https://www.sciviews.org/mongoplyr
- Size: 4.09 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
options(tibble.print_min = 5, tibble.print_max = 5)
```# mongoplyr - Use {dplyr} verbs with a MongoDB database or construct MongoDB JSON queries from {dplyr} verbs
[](https://github.com/SciViews/mongoplyr/actions/workflows/R-CMD-check.yaml) [](https://app.codecov.io/gh/SciViews/mongoplyr?branch=main) [](https://CRAN.R-project.org/package=mongoplyr) [](https://lifecycle.r-lib.org/articles/stages.html#experimental)
## Overview
Use {dplyr} verbs to query a MongoDB database. This uses {dbplyr} to create SQL queries, and then converts them into MongoDB JSON aggregating queries with "mongotranslate" from the MongoDB BI Connector (to be installed). One can also recover the JSON query to use it directly into {mongolite}. This way, {mongoplyr} serves as a translator from {dplyr} code to MongoDB JSON queries only during the prototyping phase.
**Note: this is highly experimental. Do not expect to obtain running queries in MongoDB JSON for any and all {dplyr} pipelines!** However, the code obtained could be a base to be edited later on. This is an additional argument to use {mongoplyr} only for prototyping your MongoDB JSON queries.
## Installation
You can install the development version of {mongoplyr} from [GitHub](https://github.com/SciViews/mongoplyr) with:
```{r, eval=FALSE}
# install.packages("remotes")
remotes::install_github("SciViews/mongoplyr")
```You also need "mongotranslate" and "mongodrdl" external binaries that are provided by MongoDB BI Connector (note: "mongotranslate" is **not** provided under Windows, see here under).
- For Linux and MacOS, download the MongoDB BI Connector and follow instructions from here: . Also note that you must register and you need a premise MongoDB subscription to be allowed to use these binaries, ... or you can use them for testing or prototyping purpose only (which is what we are going to do). In R, you can check accessibility to these programs by using `system("mongotranslate")`.
- For Windows, install WSL2 (see ) and a Linux distribution for WSL (Ubuntu 22.04 for instance). Then, install MongoDB BI Connector in your WSL Linux as explained here above. Then, you should be able to access "mongotranslate"/"mongodrdl" through a command like `wsl mongotranslate`. Test it under R by issuing `system("wsl mongotranslate")`.
If you made "mongotranslate" and "mongodrdl" accessible from the path (that is, these programs run in a terminal without specifying their complete path), you are done. If you prefer to isolate these programs in a directory, say in `/my/directory/to/mongodb_bi`, then {mongoplyr} will be able to access these programs if you indicate the path like this:
```{r, eval=FALSE}
options(mongotranslate.path = "/my/directory/to/mongodb_bi")
# Test it with:
#system(file.path(getOption("mongotranslate.path"), "mongostranslate"))
```## Example
Here is a basic example using the five main {dplyr} verbs `select()`, `filter()`, `mutate()`, `group_by()` and `summarise()` (+ `arrange()`). We do not set up our own MongoDB server, but just reuse the example server provided by Jeroen Ooms, the author of the {mongolite} package that {mongoplyr} uses to connect to MongoDB.
```{r}
library(mongoplyr)
library(dplyr)
database <- "test"
collection <- "mtcars"
mongodb_url <- "mongodb+srv://readwrite:[email protected]"# Connect and make sure the collection contains the mtcars dataset
mcon <- mongolite::mongo(collection, database, mongodb_url)
mcon$drop()
mcon$insert(mtcars)
```Now, we create a **tbl_mongo** object that lazily matches a **data.frame** to this connection and allows to use {dplyr} verbs to query it.
```{r}
tbl <- tbl_mongo(mongo = mcon)
query <- tbl |>
filter(mpg < 30 & wt >= 2) |>
select(cyl, wt, mpg, hp) |>
mutate(log_hp = log(hp), wt2 = wt^2) |>
group_by(cyl) |>
summarise(
max_mpg = max(mpg, na.rm = TRUE),
min_wt2 = min(wt2, na.rm = TRUE),
mean_log_hp = mean(log_hp, na.rm = TRUE)) |>
arrange(cyl)
# Here is the equivalent MongoDB JSON query
collapse(query)
```Use `collect()` to get the resulting **data.frame**:
```{r}
collect(query)
```The query is just a character string (obtained with `collapse()`) and it can be used as such directly in an `$aggregate()` method of a **mongo** object (the string is entered using the raw character syntax by embedding it in `r"{...}"`). Since it is pretty unreadable by anyone not used to the MongoDB aggregating language (and even by those who understand it, probably!) we copy and paste also the corresponding {dplyr} pipeline in comments above to better document it. In case you would need to refine this query later on, you know you can start from that {dplyr} pipeline in the comments. This way, you do not need {mongoplyr} any more, nor the external programs "mongotranslate" and "mongodrdl" to run that query on your database. So, you have just used {mongoplyr} for prototyping your query.
```{r}
# The next query is equivalent to (created with {mongoplyr}):
#tbl |>
# filter(mpg < 30 & wt >= 2) |>
# select(cyl, wt, mpg, hp) |>
# mutate(log_hp = log(hp), wt2 = wt^2) |>
# group_by(cyl) |>
# summarise(
# max_mpg = max(mpg, na.rm = TRUE),
# min_wt2 = min(wt2, na.rm = TRUE),
# mean_log_hp = mean(log_hp, na.rm = TRUE)) |>
# arrange(cyl)
mcon$aggregate(r"{[
{"$match": {"$and": [{"wt": {"$gte": {"$numberDecimal": "2"}}},{"mpg": {"$lt": {"$numberDecimal": "30"}}}]}},
{"$project": {"test__mtcars__cyl": "$cyl","test__mtcars__mpg": "$mpg","ln(test__mtcars__hp)": {"$cond": {"if": {"$gt": ["$hp",{"$literal": {"$numberInt": "0"}}]},"then": {"$ln": ["$hp"]},"else": {"$literal": null}}},"power(test__mtcars__wt,2)": {"$pow": ["$wt",{"$literal": {"$numberDecimal": "2"}}]}}},
{"$group": {"_id": "$test__mtcars__cyl","max(test__q01__mpg)": {"$max": "$test__mtcars__mpg"},"min(test__q01__wt2)": {"$min": "$power(test__mtcars__wt,2)"},"avg(test__q01__log_hp)": {"$avg": "$ln(test__mtcars__hp)"}}},
{"$addFields": {"_id": {"group_key_0": "$_id"}}},
{"$sort": {"_id.group_key_0": {"$numberInt": "1"}}},
{"$project": {"cyl": "$_id.group_key_0","max_mpg": "$max(test__q01__mpg)","min_wt2": "$min(test__q01__wt2)","mean_log_hp": "$avg(test__q01__log_hp)","_id": {"$numberInt": "0"}}}
]}")
```Compare this with a direct manipulation of `mtcars` as a **data.frame** using regular {dplyr} code:
```{r}
mtcars |>
filter(mpg < 30 & wt >= 2) |>
select(cyl, wt, mpg, hp) |>
mutate(log_hp = log(hp), wt2 = wt^2) |>
group_by(cyl) |>
summarise(
max_mpg = max(mpg, na.rm = TRUE),
min_wt2 = min(wt2, na.rm = TRUE),
mean_log_hp = mean(log_hp, na.rm = TRUE)) |>
arrange(cyl)
```Do not forget to close the connection with the server.
```{r}
mcon$disconnect()
```## Getting help
Help is accessible as usual by one of these instructions:
```{r, eval=FALSE}
library(help = "mongoplyr")
help("mongoplyr-package")
vignette("mongoplyr") # Note: no vignette is installed with install_github()
```For further instructions, please, refer to the Web site at .
------------------------------------------------------------------------
Please note that the {mongoplyr} package is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.