Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/machow/dbcooper-py

Quickly access and tab-complete the tables in your database.
https://github.com/machow/dbcooper-py

Last synced: about 1 month ago
JSON representation

Quickly access and tab-complete the tables in your database.

Awesome Lists containing this project

README

        

```{python tags=c("hide-cell")}
# TODO: is there a way to get it so dbc.list() does not show 1 item per line?

# this keeps the pandas dataframe repr from spitting out scoped style tags
# which don't render on github
import pandas as pd
pd.set_option("display.notebook_repr_html", False)
```

# dbcooper-py

[![CI](https://github.com/machow/dbcooper-py/actions/workflows/ci.yml/badge.svg)](https://github.com/machow/dbcooper-py/actions/workflows/ci.yml)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/machow/dbcooper-py/HEAD)

The dbcooper package turns a database connection into a collection of functions,
handling logic for keeping track of connections and letting you take advantage of
autocompletion when exploring a database.

It's especially helpful to use when authoring database-specific Python packages,
for instance in an internal company package or one wrapping a public data source.

For the R version see [dgrtwo/dbcooper](https://github.com/dgrtwo/dbcooper).

## Installation

```
pip install dbcooper
```

## Example

### Initializing the functions

The dbcooper package asks you to create the connection first.
As an example, we'll use the Lahman baseball database package (`lahman`).

```{python}
from sqlalchemy import create_engine
from dbcooper.data import lahman_sqlite

# connect to sqlite
engine = create_engine("sqlite://")

# load the lahman data into the "lahman" schema
lahman_sqlite(engine)
```

Next we'll set up dbcooper

```{python}
from dbcooper import DbCooper

dbc = DbCooper(engine)
```

The `DbCooper` object contains two important things:

* Accessors to fetch specific tables.
* Functions for interacting with the underlying database.

### Using table accessors

In the example below, we'll use the `"Lahman"."Salaries"` table as an example.
By default, dbcooper makes this accessible as `.lahman_salaries`.

**Plain** `.lahman_salaries` prints out table and column info, including types and descriptions.

```{python}
# show table and column descriptions
dbc.lahman_salaries
```

Note that sqlite doesn't support table and columnn descriptions, so these sections
are empty.

**Calling** `.lahman_salaries()` fetches a lazy version of the data.

```{python}
dbc.lahman_salaries()
```

Note that this data is a siuba `LazyTbl` object, which you can use to analyze the data.

```{python}
from siuba import _, count

dbc.lahman_salaries() >> count(over_100k = _.salary > 100_000)
```

### Using database functions

* `.list()`: Get a list of tables
* `.tbl()`: Access a table that can be worked with using `siuba`.
* `.query()`: Perform a SQL query and work with the result.
* `._engine`: Get the underlying sqlalchemy engine.

For instance, we could start by finding the names of the tables in the Lahman database.

```{python}
dbc.list()
```

We can access one of these tables with `dbc.tbl()`, then put it through any kind
of siuba operation.

```{python}
dbc.tbl("Salaries")
```

```{python}
from siuba import _, count
dbc.tbl("Salaries") >> count(_.yearID, sort=True)
```

If you'd rather start from a SQL query, use the `.query()` method.

```{python}
dbc.query("""
SELECT
playerID,
sum(AB) as AB
FROM Batting
GROUP BY playerID
""")
```

For anything else you might want to do, the sqlalchemy Engine object is available.
For example, the code below shows how you can set its `.echo` attribute, which
tells sqlalchemy to provide useful logs.

```{python}
dbc._engine.echo = True
table_names = dbc.list()
```

Note that the log messages above show that the `.list()` method executed two queries:
One to list tables in the "main" schema (which is empty), and one to list tables
in the "lahman" schema.

## Advanced Configuration

> ⚠️: These behaviors are well tested, but dbcooper's internals and API may change.

dbcooper can be configured in three ways, each corresponding to a class interface:

* **TableFinder**: Which tables will be used by `dbcooper`.
* **AccessorBuilder**: How table names are turned into accessors.
* **DbcDocumentedTable**: The class that defines what an accessor will return.

```{python}
from sqlalchemy import create_engine
from dbcooper.data import lahman_sqlite
from dbcooper import DbCooper, AccessorBuilder

engine = create_engine("sqlite://")
lahman_sqlite(engine)
```

### Excluding a schema

```{python}
from dbcooper import TableFinder

finder = TableFinder(exclude_schemas=["lahman"])
dbc_no_lahman = DbCooper(engine, table_finder=finder)
dbc_no_lahman.list()
```

### Formatting table names

```{python}
from dbcooper import AccessorBuilder

# omits schema, and keeps only table name
# e.g. `salaries`, rather than `lahman_salaries`
builder = AccessorBuilder(format_from_part="table")

tbl_flat = DbCooper(engine, accessor_builder=builder)
tbl_flat.salaries()
```

### Grouping tables by schema

```{python}
from dbcooper import AccessorHierarchyBuilder

tbl_nested = DbCooper(engine, accessor_builder=AccessorHierarchyBuilder())

# note the form: .
tbl_nested.lahman.salaries()
```

### Don't show table documentation

```{python}
from dbcooper import DbcSimpleTable

dbc_no_doc = DbCooper(engine, table_factory=DbcSimpleTable)
dbc_no_doc.lahman_salaries
```

Note that sqlalchemy dialects like `snowflake-sqlalchemy` cannot look up things
like table and column descriptions as well as other dialects, so `DbcSimpleTable`
may be needed to connect to snowflake (see [this issue](https://github.com/snowflakedb/snowflake-sqlalchemy/issues/276)).

## Developing

```shell
# install with development dependencies
pip install -e .[dev]

# or install from requirements file
pip install -r requirements/dev.txt
```

### Test

```shell
# run all tests, see pytest section of pyproject.toml
pytest

# run specific backends
pytest -m 'not snowflake and not bigquery'

# stop on first failure, drop into debugger
pytest -x --pdb
```

### Release

```shell
# set version number
git tag v0.0.1

# (optional) push to github
git push origin --tags

# check version
python -m setuptools_scm
```