An open API service indexing awesome lists of open source software.

https://github.com/returnstring/pgpromise

Promisified async PostgreSQL queries for R
https://github.com/returnstring/pgpromise

data-engineering data-science postgresql r

Last synced: 3 months ago
JSON representation

Promisified async PostgreSQL queries for R

Awesome Lists containing this project

README

          

# pgpromise
Promisified async PostgreSQL queries for R.

## Motivation
Typically, querying a database in R implies blocking the current thread of execution. In most cases, this is perfectly acceptable; after all, we're waiting on a result, and have no work to be doing in the interim. However, for use cases like multi-user dashboards, it's often desirable to simultaneously service N clients, e.g. many users loading various pages on a Shiny dashboard, each of which requires DB queries. The classic strategy then has an impact on the user experience, as a single long-running query from one user can delay page loads for others. Really, whilst those long-running queries are being handled on the DB server, we should be running all the remaining CPU-bound work for our dashboard.

Fortunately, thanks to the excellent work of the RStudio team, the concept of [promises](https://github.com/rstudio/promises) now exists in R programming, and Shiny has already been updated to take advantage of this. This package implements promise-based queries for PostgreSQL, allowing you to issue queries to PostgreSQL itself (or any wire-protocol compatible DB, e.g. Amazon Redshift) and have the results delivered asynchronously.

At present, this package is only designed to work with queries generated by `dbplyr`.

## Example
```R
library(dplyr)
library(promises)
library(pgpromise)

conn <- create_postgres_pool(
# these arguments are mostly just straightforward connection settings
host = "your-postgres-server",
port = 5432,
db = "superimportantolapdb",
user = "dashboarduser",
password = "dashboardpw",
# workers dictates the number of connections used:
# if you issue more simultaneous queries than this,
# then they'll just be queued until a worker becomes available
workers = 8
)

item_sales <- tbl(conn, "item_sales")
item_sales %>%
filter(timestamp >= '2018-01-01') %>%
group_by(sku) %>%
summarise(total_revenue = sum(price)) %>%
collect_async() %...>% # note the promise pipe operator!
{
print("2: this will, in fact, be printed second, when we have the results")
print(.)
}

print("1: this will be printed first, before the query completes")
```