https://github.com/INWTlab/dbtools

R package for connecting and querying databases
https://github.com/INWTlab/dbtools

Last synced: 3 months ago
JSON representation

R package for connecting and querying databases

Host: GitHub
URL: https://github.com/INWTlab/dbtools
Owner: INWTlab
License: other
Created: 2015-07-29T08:40:05.000Z (over 10 years ago)
Default Branch: main
Last Pushed: 2024-10-09T08:50:16.000Z (about 1 year ago)
Last Synced: 2024-12-04T07:36:23.471Z (11 months ago)
Language: R
Size: 204 KB
Stars: 6
Watchers: 7
Forks: 4
Open Issues: 7
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS
- License: LICENSE

Awesome Lists containing this project

jimsghstars - INWTlab/dbtools - R package for connecting and querying databases (R)

README

          ---

output:

  md_document:

    variant: markdown_github

---

[![Travis-CI Build Status](https://app.travis-ci.com/INWTlab/dbtools.svg?branch=master)](https://app.travis-ci.com/INWTlab/dbtools)

```{r echo=FALSE}

options(datatable.print.nrows = 10)

```

This package abstracts typical patterns used when connecting to and

communicating with databases in R. It aims to provide very few, simple and

reliable functions for sending queries and data to databases.

## Installation

```{r eval=FALSE}

devtools::install_github("INWT/dbtools")

```

We will start with a simple example showing how to use `dbtools` to send queries

and data to a database. The example will introduce the main parts of the

package, namely the *Credentials* class as well as both `sendQuery` and

`sendData`.

## Introductory example

In this example we show how to connect to a database, how to communicate with

the database, how to set up a database table, and finally how to send and

retrieve data.

To begin with, we have to define an object of class *Credentials* which will

store all necessary information to connect to a database. The driver is

mandatory, all other arguments depend on the specific back-end.

```{r}

library("dbtools")

cred <- Credentials(drv = RSQLite::SQLite, dbname = "example.db")

```

Opposed to the functions available from DBI, functions from `dbtools` need a

*Credentials* instance as argument that takes care of connecting to the

database, communicating with the database and closing the connection.

Now, let's check whether we can actually access the database example.db.

```{r}

testConnection(cred)

cred

```

For the remainder of this example, we make use of the USArrests dataset.

Unfortunately, the data contains some information in its row names, namely the

respective state. Since `dbtools` does not support row names, we have to convert

them to a variable.

```{r results = 'hide'}

data(USArrests, envir = environment())

USArrests$State <- row.names(USArrests)

USArrests <- USArrests[c("State", "Murder", "Assault", "UrbanPop", "Rape")]

row.names(USArrests) <- NULL

USArrests

```

Next, we will create the database table by sending a `CREATE TABLE` query to

the database.

```{r results = 'hide'}

sendQuery(

  cred,

  "CREATE TABLE `USArrests` (

  State TEXT PRIMARY KEY,

  Murder INTEGER,

  Assault REAL,

  UrbanPop REAL,

  Rape INTEGER);"

)

```

Now, we can write the USArrests data to the USArrests database table by using

the `sendData` function.

```{r results = 'hide'}

sendData(cred, USArrests)

```

If we want to read data from the database, we can do this by using `sendQuery`,

the same function we used for creating the database table.

```{r results = 'hide'}

dat <- sendQuery(cred, "SELECT * FROM `USArrests`;")

dat

```

Now we will dig a bit deeper into the functionality of `dbtools`.

## sendQuery

Basically, you already know how to use `sendQuery`. In our introductory example

we used it to create a database table and to query some data.

In your normal work-flow you will sometimes want to split up a complex query

into more tangible chunks. The approach we take here is to allow for a vector of

queries as argument. The result of these queries have to be *row-bindable*. To

make an example lets say we want to query each state separately:

```{r}

queryFun <- function(state) {

  paste0("SELECT * FROM USArrests WHERE State = '", state, "';")

}

sendQuery(cred, queryFun(dat$State))

```

In such a case `sendQuery` will perform all queries on one connection. A

different approach is to fetch the results of the original query in chunks,

which we do not support yet.

## sendData

As with `sendQuery`, you basically already know how to use `sendData`. In the

introductory example we used it to send the USArrests data to the USArrests

database table.

When using `sendData` you might be interesting in how to handle possible

primary key violations. By default, `sendData` will keep old rows and ignore any

duplicates. But you may use the mode argument and set it to "replace" (replace

old rows) or "truncate" (delete old rows from database table before writing

data). If you are using a non MySQL connection, you may notice that this feature

is not available yet.

## Unstable connections

One of the problems we face on a regular basis are connection problems to

external servers. To address this `sendQuery` will evaluate everything in a

'try-catch' handler abstracted in `dbtools::reTry`. With this you can state how

many tries a query has, how many seconds should be waited between each iteration

and how the error messages should be logged:

```{r error=TRUE}

dat <- sendQuery(

  cred,

  "SELECT * FROM USArrest;", # wrong name for illustration

  tries = 2,

  intSleep = 1

)

```

## Multiple Databases

Sometimes your data can be distributed on different servers but you want to send

the same query to those servers. What you can do is give `sendQuery` a

*CredentialsList*.

```{r results='hide'}

file.copy("example.db", "example1.db")

```

Now we want to load the data from `example1.db` and `example.db` which can be

implemented as follows:

```{r}

cred <- Credentials(

  RSQLite::SQLite,

  dbname = c("example.db", "example1.db")

)

sendQuery(cred, "SELECT * FROM USArrests;")

```

It might also be of interest to query your databases in parallel. For that it is

possible to supply a apply/map function which in turn can be a parallel lapply

like mclapply or something else:

```{r}

sendQuery(

  cred,

  "SELECT * FROM USArrests;",

  mc.cores = 2,

  applyFun = parallel::mclapply

)

```

Potentially you can send multiple queries to multiple databases. The results are tried to be simplified by default:

```{r}

sendQuery(cred, c("SELECT * FROM USArrests;", "SELECT 1 AS x;"))

sendQuery(cred, c("SELECT * FROM USArrests;", "SELECT 1 AS x;"), simplify = FALSE)

```

Both functionalities are available for `sendData`, too.

## Parameterized Queries ##

In many applications it is easier and more tangible to separate SQL and R

code. Furthermore we oftentimes paste queries together to have something like

parameterized statements. There are various solutions for this type of problem

but not many for the R language. Hence `dbtools` provides an own interface to

what may be understood as *template queries*. These templates solve two issues

for us:

1. Put SQL code where it belongs: a `.sql` file.

2. Provide a simple way to pass objects to these queries, using parameters.

The use of these features is simple enough. A template is defined as a character

and regions in which parameters are substituted are denoted by two curly

braces. Users of [Liquid templates](http://shopify.github.io/liquid/) may be

familiar with this idea. Everything inside these regions is interpreted as

R-expression and can contain arbitrary operations. The result of the evaluation

should be a character of length one.

```{r}

templateQuery <- "SELECT {{ sqlName(fieldName) }} FROM `someTable`;"

Query(templateQuery, fieldName = "someField")

```

When such a query lives inside a file we can use a connection object and pass it

to `Query`.

```{r}

otherTemplateQuery <-

  "SELECT `someField` FROM `someTable` WHERE `primaryKey` IN {{ sqlInNums(ids) }};"

writeLines(otherTemplateQuery, tmpFile <- tempfile())

Query(file(tmpFile), ids = 1:10)

```

```{r}

unlink(c("example.db", "example1.db"))

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/INWTlab/dbtools

Awesome Lists containing this project

README