Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ropensci/nodbi
Document DBI connector for R
https://github.com/ropensci/nodbi
couchdb database elasticsearch mongodb nosql postgresql r r-package rstats sqlite
Last synced: about 1 month ago
JSON representation
Document DBI connector for R
- Host: GitHub
- URL: https://github.com/ropensci/nodbi
- Owner: ropensci
- License: other
- Created: 2015-03-26T21:48:45.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2024-11-10T17:23:53.000Z (2 months ago)
- Last Synced: 2024-11-15T12:26:32.294Z (about 2 months ago)
- Topics: couchdb, database, elasticsearch, mongodb, nosql, postgresql, r, r-package, rstats, sqlite
- Language: R
- Homepage: https://docs.ropensci.org/nodbi
- Size: 2.95 MB
- Stars: 78
- Watchers: 9
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Codemeta: codemeta.json
Awesome Lists containing this project
- jimsghstars - ropensci/nodbi - Document DBI connector for R (R)
README
---
output: github_document
editor_options:
chunk_output_type: console
---# nodbi
```{r echo=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
eval = FALSE,
comment = "#",
out.width = "100%"
)```
[![R-CMD-check](https://github.com/ropensci/nodbi/workflows/R-CMD-check/badge.svg)](https://github.com/ropensci/nodbi/actions?query=workflow%3AR-CMD-check)
[![codecov](https://codecov.io/gh/rfhb/nodbi/branch/master/graph/badge.svg)](https://app.codecov.io/gh/rfhb/nodbi)
[![CRAN status](https://www.r-pkg.org/badges/version/nodbi)](https://CRAN.R-project.org/package=nodbi)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
`nodbi` is an R package that provides a single interface for several NoSQL databases and databases with JSON functionality, with the same function parameters and return values across all database backends. Last updated 2024-11-10.
| Currently, `nodbi` supports
as database backends | for an `R` object of any
of these data types | for these operations |
| :--------------------- | :--------------------- | :--------------------- |
| MongoDB | data.frame | List, Exists |
| SQLite | list | Create |
| PostgreSQL | JSON string | Get |
| DuckDB | file name of NDJSON records | Query |
| Elasticsearch | URL of NDJSON records | Update |
| CouchDB | | Delete |For speed comparisons of database backends, see [benchmark](#benchmark) and [testing](#testing) below.
## API overview
Parameters for `docdb_*()` functions are the same across all database backends. See [walk-through](#walk-through) below and the canonical testing in [core-nodbi.R](./tests/testthat/core-nodbi.R). "Container" is used as term to indicate where conceptually the backend holds the data, see [Database connections](#database-connections) below. The `key` parameter holds the name of a container.
| Purpose | Function call |
| :------------------------------ | :------------------------------------ |
| Create database connection (see below) | `src <- nodbi::src_{duckdb, postgres, mongo, sqlite, couchdb, elastic}()` |
| Load `my_data` (a data frame, list, JSON string, or file name or URL pointing to NDJSON records) into database, container `my_container` | `nodbi::docdb_create(src = src, key = "my_container", value = my_data)` |
| Get all documents back into a data frame | `nodbi::docdb_get(src = src, key = "my_container")` |
| Get documents selected with query (as MongoDB-compatible JSON) into a data frame | `nodbi::docdb_query(src = src, key = "my_container", query = '{"age": 20}')` |
| Get selected fields (in MongoDB compatible JSON) from documents selected by query into a data frame | `nodbi::docdb_query(src = src, key = "my_container", query = '{"age": {"$gt": 20}}', fields = '{"friends.name": 1, "_id": 0, "age": 1}', limit = 2L)` |
| Update (patch) documents selected by query with new data `my_data` (in a data frame, list, JSON string, or file name or URL pointing to NDJSON records) | `nodbi::docdb_update(src = src, key = "my_container", value = my_data, query = '{"age": 20}')` |
| Check if container exists | `nodbi::docdb_exists(src = src, key = "my_container")` |
| List all containers in database | `nodbi::docdb_list(src = src)` |
| Delete document(s) in container | `nodbi::docdb_delete(src = src, key = "my_container", query = '{"age": 20}')` |
| Delete container | `nodbi::docdb_delete(src = src, key = "my_container")` |
| Close and remove database connection manually (when restarting R, connections are automatically closed and removed by `nodbi`) | `rm(src)` |## Install
CRAN version
```{r eval=FALSE}
install.packages("nodbi")
```Development version
```{r eval=FALSE}
remotes::install_github("ropensci/nodbi")
```Load package from library
```{r}
library("nodbi")
```## Database connections {#database-connections}
Overview on parameters and aspects that are specific to the database backend. These are only needed once, for for `src_*()` to create a connection object. Any such connection object is subsequently used similarly across the `docdb_*` functions.
"Container" refers to how conceptually the backend holds the data. Data types are mapped from JSON to R objects by [jsonlite](https://CRAN.R-project.org/package=jsonlite). Any root-level `_id` is extracted from the document(s) and used for an index column `_id`, otherwise a UUID is created as `_id`.
### DuckDB
See also . "Container" refers to a DuckDB table, with columns `_id` and `json` created and used by package `nodbi`, applying SQL functions and functions as per to the `json` column. Each row in the table represents a `JSON` document.
```{r}
src <- nodbi::src_duckdb(dbdir = ":memory:", ...)
```### MongoDB
"Container" refers to a MongoDB collection, in which `nodbi` creates JSON documents. See also . MongoDB but none of the other databases require to specify the container name already in the `src_*()` function; use the `collection` name for parameter `key` in `docdb_*` functions.
```{r}
src <- nodbi::src_mongo(
collection = "my_container", db = "my_database",
url = "mongodb://localhost", ...)
```### SQLite
"Container" refers to an SQLite table, with columns `_id` and `json` created and used by package `nodbi`, applying SQL functions and functions as per to the `json` column. Each row in the table represents a `JSON` document. The table is indexed on `_id`. See also .
```{r}
src <- nodbi::src_sqlite(dbname = ":memory:", ...)
```### CouchDB
"Container" refers to a CouchDB database, in which `nodbi` creates JSON documents. See also . With CouchDB, function `docdb_update()` uses [jqr](https://cran.r-project.org/package=jqr) to implement patching JSON, in analogy to functions available for the other databases.
```{r}
src <- nodbi::src_couchdb(
host = "127.0.0.1", port = 5984L, path = NULL,
transport = "http", user = NULL, pwd = NULL, headers = NULL)
```### Elasticsearch
"Container" refers to an Elasticsearch index, in which `nodbi` creates JSON documents. Opensearch can equally be used. See also . Only lowercase is accepted for container names (in parameter `key` of `docdb_*` functions).
```{r}
src <- nodbi::src_elastic(
host = "127.0.0.1", port = 9200L, path = NULL,
transport_schema = "http", user = NULL, pwd = NULL, ...)
```### PostgreSQL
"Container" refers to a PostgreSQL table, with columns `_id` and `json` created and used by package `nodbi`, applying SQL functions and functions as per to the `json` column. With PostgreSQL, a custom `plpgsql` function [jsonb_merge_patch()](https://github.com/ropensci/nodbi/blob/master/R/src_postgres.R#L60) is used for `docdb_update()`. The order of variables in data frames returned by `docdb_get()` and `docdb_query()` can differ from their order the input to `docdb_create()`.
```{r}
src <- nodbi::src_postgres(
dbname = "my_database", host = "127.0.0.1", port = 5432L, ...)
```## Walk-through {#walk-through}
This example is to show how functional `nodbi` is at this time: With any of the six database backends, the functions work in the same way and return the same values.
```{r}
# load nodbi
library(nodbi)# name of container
key <- "my_container"# connect any of these database backends
src <- src_duckdb()
src <- src_mongo(collection = key)
src <- src_sqlite()
src <- src_postgres()
src <- src_elastic()
src <- src_couchdb(
user = Sys.getenv("COUCHDB_TEST_USER"),
pwd = Sys.getenv("COUCHDB_TEST_PWD"))# check if container already exists
docdb_exists(src, key)
# [1] FALSE# load data (here data frame, alternatively a list, JSON or file with NSJSON)
# into the container "my_container" specified in "key" parameter
docdb_create(src, key, value = mtcars)
# [1] 32# load additionally 98 NDJSON records
docdb_create(src, key, "https://httpbin.org/stream/98")
# Note: container 'my_container' already exists
# [1] 98# load additionally contacts JSON data, from package nodbi
docdb_create(src, key, contacts)
# Note: container 'my_container' already exists
# [1] 5# get all documents, irrespective of schema
dplyr::tibble(docdb_get(src, key))
# # A tibble: 135 × 27
# `_id` isActive balance age eyeColor name email about registered tags friends
#
# 1 5cd6… TRUE $2,412… 20 blue Kris… kris… "Sin… 2017-07-1…
# 2 5cd6… FALSE $3,400… 20 brown Rae … raec… "Nis… 2018-12-1…
# 3 5cd6… TRUE $1,161… 22 brown Pace… pace… "Eiu… 2018-08-1…
# 4 5cd6… FALSE $2,579… 30 brown Will… will… "Nul… 2018-02-1…
# 5 5cd6… FALSE $3,808… 23 green Lacy… lacy… "Sun… 2014-08-0…
# 6 69bc… NA NA NA NA NA NA NA NA
# 7 69bc… NA NA NA NA NA NA NA NA
# 8 69bc… NA NA NA NA NA NA NA NA
# 9 69bc… NA NA NA NA NA NA NA NA
# 10 69bc… NA NA NA NA NA NA NA NA
# # ℹ 125 more rows
# # ℹ 16 more variables: url , args , headers , origin ,
# # id , mpg , cyl , disp , hp , drat , wt ,
# # qsec , vs , am , gear , carb
# # ℹ Use `print(n = ...)` to see more rows# query some documents
docdb_query(src, key, query = '{"mpg": {"$gte": 30}}')
# _id mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Fiat 128 32 4 79 66 4.1 2.2 19 1 1 4 1
# 2 Honda Civic 30 4 76 52 4.9 1.6 19 1 1 4 2
# 3 Toyota Corolla 34 4 71 65 4.2 1.8 20 1 1 4 1
# 4 Lotus Europa 30 4 95 113 3.8 1.5 17 1 1 5 2# query some fields from some documents; 'query' is a mandatory
# parameter and is used here in its position in the signature
docdb_query(src, key, '{"mpg": {"$gte": 30}}', fields = '{"wt": 1, "mpg": 1}')
# _id wt mpg
# 1 Fiat 128 2.2 32
# 2 Honda Civic 1.6 30
# 3 Lotus Europa 1.5 30
# 4 Toyota Corolla 1.8 34# query some subitem fields from some documents
str(docdb_query(
src, key,
query = '{"$or": [{"age": {"$gt": 21}},
{"friends.name": {"$regex": "^B[a-z]{3,9}.*"}}]}',
fields = '{"age": 1, "friends.name": 1}'))
# 'data.frame': 3 obs. of 3 variables:
# $ _id : chr "5cd6785325ce3a94dfc54096" "5cd6785335b63cb19dfa8347" "5cd67853f841025e65ce0ce2"
# $ age : int 22 30 23
# $ friends.name:List of 3
# ..$ : chr "Baird Keller" "Francesca Reese" "Dona Bartlett"
# ..$ : chr "Coleen Dunn" "Doris Phillips" "Concetta Turner"
# ..$ : chr "Wooten Goodwin" "Brandie Woodward" "Angelique Britt"# such queries can also be used for updating (patching) selected documents
# with a new 'value'(s) from a JSON string, a data frame a list or a file with NSJSON)
docdb_update(src, key, value = '{"vs": 9, "xy": [1, 2]}', query = '{"carb": 3}')
# [1] 3
docdb_query(src, key, '{"carb": {"$in": [1,3]}}', fields = '{"vs": 1, "_id": 0}')[[1]]
# [1] 1 1 1 9 9 9 1 1 1 1
docdb_get(src, key)[c(3, 109, 130, 101), c("_id", "xy", "url", "email")]
# _id xy url email
# 3 5cd6785325ce3a94dfc54096 NULL [email protected]
# 109 Dodge Challenger NULL
# 130 Pontiac Firebird NULL
# 101 69bcd195-a59c-11ee-bfb9-acbc328130bb NULL https://httpbin.org/stream/98# use with dplyr
# *note* that dplyr includes a (deprecated) function src_sqlite
# which would mask nodbi's src_sqlite, so it is excluded here
library("dplyr", exclude = c("src_sqlite", "src_postgres"))
#
docdb_get(src, key) %>%
group_by(gear) %>%
summarise(mean_mpg = mean(mpg))
# # A tibble: 4 × 2
# gear mean_mpg
#
# 1 3 16.1
# 2 4 24.5
# 3 5 21.4
# 4 NA NA# delete documents; query is optional parameter and has to be
# specified for deleting documents instead of deleting the container
dim(docdb_query(src, key, query = '{"$or": [{"age": {"$lte": 20}}, {"age": {"$gte": 25}}]}'))
# [1] 3 11
docdb_delete(src, key, query = '{"$or": [{"age": {"$lte": 20}}, {"age": {"$gte": 25}}]}')
# TRUE
nrow(docdb_get(src, key))
# [1] 132# delete container from database
docdb_delete(src, key)
# [1] TRUE
#
# shutdown
DBI::dbDisconnect(src$con, shutdown = TRUE); rm(src)
```## Benchmark {#benchmark}
```{r}
library("nodbi")srcMongo <- src_mongo()
srcSqlite <- src_sqlite()
srcPostgres <- src_postgres()
srcDuckdb <- src_duckdb()
srcElastic <- src_elastic()
srcCouchdb <- src_couchdb(
user = Sys.getenv("COUCHDB_TEST_USER"),
pwd = Sys.getenv("COUCHDB_TEST_PWD"))key <- "test"
query <- '{"clarity": {"$in": ["NOTME", "VS1"]}}'
fields <- '{"cut": 1, "_id": 1, "clarity": 1}'
value <- '{"clarity": "XYZ", "new": ["ABC", "DEF"]}'
data <- diamonds[1:1000, ]
ndjs <- tempfile()
jsonlite::stream_out(diamonds[1:10000, ], con = file(ndjs), verbose = FALSE)testFunction <- function(src, key, value, query, fields) {
try(docdb_delete(src, key), silent = TRUE)
on.exit(docdb_delete(src, key))
suppressMessages(docdb_create(src, key, data))
suppressMessages(docdb_create(src, key, ndjs))
head(docdb_get(src, key))
docdb_query(src, key, query = query, fields = fields)
docdb_query(src, key, query = query, listfields = TRUE)
docdb_update(src, key, value = value, query = query)
}result <- rbenchmark::benchmark(
MongoDB = testFunction(src = srcMongo, key, value, query, fields),
SQLite = testFunction(src = srcSqlite, key, value, query, fields),
Elastic = testFunction(src = srcElastic, key, value, query, fields),
CouchDB = testFunction(src = srcCouchdb, key, value, query, fields),
PostgreSQL = testFunction(src = srcPostgres, key, value, query, fields),
DuckDB = testFunction(src = srcDuckdb, key, value, query, fields),
replications = 3L
)# 2024-11-10 with M3 hardware, databases via homebrew
result[rev(order(result$elapsed)), c('test', 'replications', 'elapsed')]
# test replications elapsed
# 4 CouchDB 3 52.81
# 3 Elastic 3 27.76
# 5 PostgreSQL 3 1.59
# 1 MongoDB 3 1.41
# 6 DuckDB 3 1.10
# 2 SQLite 3 0.71message(R.version$version.string)
# R version 4.4.2 (2024-10-31)pkgs <- c("RSQLite", "duckdb", "RPostgres", "mongolite", "elastic", "sofa")
for (pkg in pkgs) message(pkg, ": ", packageVersion(pkg))
# RSQLite: 2.3.7.9017
# duckdb: 1.1.2
# RPostgres: 1.4.7
# mongolite: 2.8.1
# elastic: 1.2.0
# sofa: 0.4.0
```## Testing {#testing}
Every database backend is subjected to identical tests, see [core-nodbi.R](https://github.com/ropensci/nodbi/blob/master/tests/testthat/core-nodbi.R).
```{r testing_and_coverage}
# 2024-11-10
suppressMessages(testthat::test_local())
# ✔ | F W S OK | Context
# ✔ | 2 175 | couchdb [47.9s]
# ✔ | 1 174 | duckdb [3.5s]
# ✔ | 2 173 | elastic [38.3s]
# ✔ | 2 173 | mongodb [3.6s]
# ✔ | 176 | postgres [4.3s]
# ✔ | 177 | sqlite [3.5s]
#
# ══ Results ══════════════════════════
# Duration: 101.2 s
#
# ── Skipped tests (7) ────────────────
# • Testing for auto disconnect and shutdown not relevant (3):
# test-couchdb.R:26:3, test-elastic.R:21:3, test-mongodb.R:24:3
# • Testing for parallel writes not possible or implemented (4):
# test-couchdb.R:26:3, test-duckdb.R:22:3,
# test-elastic.R:21:3, test-mongodb.R:24:3
#
# [ FAIL 0 | WARN 0 | SKIP 7 | PASS 1048 ]# 2024-11-10
covr::package_coverage(path = ".", type = "tests")
# nodbi Coverage: 94.07%
# R/src_postgres.R: 79.31%
# R/src_duckdb.R: 79.49%
# R/zzz.R: 87.79%
# R/src_mongo.R: 92.59%
# R/update.R: 94.40%
# R/query.R: 94.57%
# R/get.R: 95.24%
# R/create.R: 96.25%
# R/delete.R: 98.96%
# R/exists.R: 100.00%
# R/list.R: 100.00%
# R/src_couchdb.R: 100.00%
# R/src_elasticsearch.R: 100.00%
# R/src_sqlite.R: 100.00%
```## Notes
- Please [report any issues or bugs](https://github.com/ropensci/nodbi/issues).
- License: MIT
- Get citation information for `nodbi` in R doing `citation(package = 'nodbi')`
- Please note that this package is released with a [Contributor Code of Conduct](https://ropensci.org/code-of-conduct/). By contributing to this project, you agree to abide by its terms.
- Support for redis has been removed since version 0.5.