https://github.com/machow/dbcooper-py

Quickly access and tab-complete the tables in your database.
https://github.com/machow/dbcooper-py

Last synced: 7 months ago
JSON representation

Quickly access and tab-complete the tables in your database.

Host: GitHub
URL: https://github.com/machow/dbcooper-py
Owner: machow
License: mit
Created: 2022-03-05T22:08:35.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-03-27T19:53:08.000Z (over 2 years ago)
Last Synced: 2025-04-10T14:35:08.778Z (7 months ago)
Language: Python
Homepage:
Size: 61.5 KB
Stars: 11
Watchers: 2
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

README

          ```{python tags=c("hide-cell")}

# TODO: is there a way to get it so dbc.list() does not show 1 item per line?

# this keeps the pandas dataframe repr from spitting out scoped style tags

# which don't render on github

import pandas as pd

pd.set_option("display.notebook_repr_html", False)

```

# dbcooper-py

[![CI](https://github.com/machow/dbcooper-py/actions/workflows/ci.yml/badge.svg)](https://github.com/machow/dbcooper-py/actions/workflows/ci.yml)

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/machow/dbcooper-py/HEAD)

The dbcooper package turns a database connection into a collection of functions,

handling logic for keeping track of connections and letting you take advantage of

autocompletion when exploring a database.

It's especially helpful to use when authoring database-specific Python packages,

for instance in an internal company package or one wrapping a public data source.

For the R version see [dgrtwo/dbcooper](https://github.com/dgrtwo/dbcooper).

## Installation

```

pip install dbcooper

```

## Example

### Initializing the functions

The dbcooper package asks you to create the connection first.

As an example, we'll use the Lahman baseball database package (`lahman`).

```{python}

from sqlalchemy import create_engine

from dbcooper.data import lahman_sqlite

# connect to sqlite

engine = create_engine("sqlite://")

# load the lahman data into the "lahman" schema

lahman_sqlite(engine)

```

Next we'll set up dbcooper

```{python}

from dbcooper import DbCooper

dbc = DbCooper(engine)

```

The `DbCooper` object contains two important things:

* Accessors to fetch specific tables.

* Functions for interacting with the underlying database.

### Using table accessors

In the example below, we'll use the `"Lahman"."Salaries"` table as an example.

By default, dbcooper makes this accessible as `.lahman_salaries`.

**Plain** `.lahman_salaries` prints out table and column info, including types and descriptions.

```{python}

# show table and column descriptions

dbc.lahman_salaries

```

Note that sqlite doesn't support table and columnn descriptions, so these sections

are empty.

**Calling** `.lahman_salaries()` fetches a lazy version of the data.

```{python}

dbc.lahman_salaries()

```

Note that this data is a siuba `LazyTbl` object, which you can use to analyze the data.

```{python}

from siuba import _, count

dbc.lahman_salaries() >> count(over_100k = _.salary > 100_000)

```

### Using database functions

* `.list()`: Get a list of tables

* `.tbl()`: Access a table that can be worked with using `siuba`.

* `.query()`: Perform a SQL query and work with the result.

* `._engine`: Get the underlying sqlalchemy engine.

For instance, we could start by finding the names of the tables in the Lahman database.

```{python}

dbc.list()

```

We can access one of these tables with `dbc.tbl()`, then put it through any kind

of siuba operation.

```{python}

dbc.tbl("Salaries")

```

```{python}

from siuba import _, count

dbc.tbl("Salaries") >> count(_.yearID, sort=True)

```

If you'd rather start from a SQL query, use the `.query()` method.

```{python}

dbc.query("""

    SELECT

        playerID,

        sum(AB) as AB

    FROM Batting

    GROUP BY playerID

""")

```

For anything else you might want to do, the sqlalchemy Engine object is available.

For example, the code below shows how you can set its `.echo` attribute, which

tells sqlalchemy to provide useful logs.

```{python}

dbc._engine.echo = True

table_names = dbc.list()

```

Note that the log messages above show that the `.list()` method executed two queries:

One to list tables in the "main" schema (which is empty), and one to list tables

in the "lahman" schema.

## Advanced Configuration

> ⚠️: These behaviors are well tested, but dbcooper's internals and API may change.

dbcooper can be configured in three ways, each corresponding to a class interface:

* **TableFinder**: Which tables will be used by `dbcooper`.

* **AccessorBuilder**: How table names are turned into accessors.

* **DbcDocumentedTable**: The class that defines what an accessor will return.

```{python}

from sqlalchemy import create_engine

from dbcooper.data import lahman_sqlite

from dbcooper import DbCooper, AccessorBuilder

engine = create_engine("sqlite://")

lahman_sqlite(engine)

```

### Excluding a schema

```{python}

from dbcooper import TableFinder

finder = TableFinder(exclude_schemas=["lahman"])

dbc_no_lahman = DbCooper(engine, table_finder=finder)

dbc_no_lahman.list()

```

### Formatting table names

```{python}

from dbcooper import AccessorBuilder

# omits schema, and keeps only table name

# e.g. `salaries`, rather than `lahman_salaries`

builder = AccessorBuilder(format_from_part="table")

tbl_flat = DbCooper(engine, accessor_builder=builder)

tbl_flat.salaries()

```

### Grouping tables by schema

```{python}

from dbcooper import AccessorHierarchyBuilder

tbl_nested = DbCooper(engine, accessor_builder=AccessorHierarchyBuilder())

# note the form: .

tbl_nested.lahman.salaries()

```

### Don't show table documentation

```{python}

from dbcooper import DbcSimpleTable

dbc_no_doc = DbCooper(engine, table_factory=DbcSimpleTable)

dbc_no_doc.lahman_salaries

```

Note that sqlalchemy dialects like `snowflake-sqlalchemy` cannot look up things

like table and column descriptions as well as other dialects, so `DbcSimpleTable`

may be needed to connect to snowflake (see [this issue](https://github.com/snowflakedb/snowflake-sqlalchemy/issues/276)).

## Developing

```shell

# install with development dependencies

pip install -e .[dev]

# or install from requirements file

pip install -r requirements/dev.txt

```

### Test

```shell

# run all tests, see pytest section of pyproject.toml

pytest

# run specific backends

pytest -m 'not snowflake and not bigquery'

# stop on first failure, drop into debugger

pytest -x --pdb

```

### Release

```shell

# set version number

git tag v0.0.1

# (optional) push to github

git push origin --tags

# check version

python -m setuptools_scm

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/machow/dbcooper-py

Awesome Lists containing this project

README