https://github.com/returnstring/pgpromise
Promisified async PostgreSQL queries for R
https://github.com/returnstring/pgpromise
data-engineering data-science postgresql r
Last synced: 3 months ago
JSON representation
Promisified async PostgreSQL queries for R
- Host: GitHub
- URL: https://github.com/returnstring/pgpromise
- Owner: returnString
- License: mit
- Created: 2018-06-02T16:37:59.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-06-07T22:54:08.000Z (over 7 years ago)
- Last Synced: 2025-01-02T08:40:12.768Z (about 1 year ago)
- Topics: data-engineering, data-science, postgresql, r
- Language: C++
- Homepage:
- Size: 11.7 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# pgpromise
Promisified async PostgreSQL queries for R.
## Motivation
Typically, querying a database in R implies blocking the current thread of execution. In most cases, this is perfectly acceptable; after all, we're waiting on a result, and have no work to be doing in the interim. However, for use cases like multi-user dashboards, it's often desirable to simultaneously service N clients, e.g. many users loading various pages on a Shiny dashboard, each of which requires DB queries. The classic strategy then has an impact on the user experience, as a single long-running query from one user can delay page loads for others. Really, whilst those long-running queries are being handled on the DB server, we should be running all the remaining CPU-bound work for our dashboard.
Fortunately, thanks to the excellent work of the RStudio team, the concept of [promises](https://github.com/rstudio/promises) now exists in R programming, and Shiny has already been updated to take advantage of this. This package implements promise-based queries for PostgreSQL, allowing you to issue queries to PostgreSQL itself (or any wire-protocol compatible DB, e.g. Amazon Redshift) and have the results delivered asynchronously.
At present, this package is only designed to work with queries generated by `dbplyr`.
## Example
```R
library(dplyr)
library(promises)
library(pgpromise)
conn <- create_postgres_pool(
# these arguments are mostly just straightforward connection settings
host = "your-postgres-server",
port = 5432,
db = "superimportantolapdb",
user = "dashboarduser",
password = "dashboardpw",
# workers dictates the number of connections used:
# if you issue more simultaneous queries than this,
# then they'll just be queued until a worker becomes available
workers = 8
)
item_sales <- tbl(conn, "item_sales")
item_sales %>%
filter(timestamp >= '2018-01-01') %>%
group_by(sku) %>%
summarise(total_revenue = sum(price)) %>%
collect_async() %...>% # note the promise pipe operator!
{
print("2: this will, in fact, be printed second, when we have the results")
print(.)
}
print("1: this will be printed first, before the query completes")
```