Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/machow/dbcooper-py
Quickly access and tab-complete the tables in your database.
https://github.com/machow/dbcooper-py
Last synced: 3 months ago
JSON representation
Quickly access and tab-complete the tables in your database.
- Host: GitHub
- URL: https://github.com/machow/dbcooper-py
- Owner: machow
- License: mit
- Created: 2022-03-05T22:08:35.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-03-27T19:53:08.000Z (almost 2 years ago)
- Last Synced: 2023-08-11T10:31:07.348Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 61.5 KB
- Stars: 11
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
```{python tags=c("hide-cell")}
# TODO: is there a way to get it so dbc.list() does not show 1 item per line?# this keeps the pandas dataframe repr from spitting out scoped style tags
# which don't render on github
import pandas as pd
pd.set_option("display.notebook_repr_html", False)
```# dbcooper-py
[![CI](https://github.com/machow/dbcooper-py/actions/workflows/ci.yml/badge.svg)](https://github.com/machow/dbcooper-py/actions/workflows/ci.yml)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/machow/dbcooper-py/HEAD)The dbcooper package turns a database connection into a collection of functions,
handling logic for keeping track of connections and letting you take advantage of
autocompletion when exploring a database.It's especially helpful to use when authoring database-specific Python packages,
for instance in an internal company package or one wrapping a public data source.For the R version see [dgrtwo/dbcooper](https://github.com/dgrtwo/dbcooper).
## Installation
```
pip install dbcooper
```## Example
### Initializing the functions
The dbcooper package asks you to create the connection first.
As an example, we'll use the Lahman baseball database package (`lahman`).```{python}
from sqlalchemy import create_engine
from dbcooper.data import lahman_sqlite# connect to sqlite
engine = create_engine("sqlite://")# load the lahman data into the "lahman" schema
lahman_sqlite(engine)
```Next we'll set up dbcooper
```{python}
from dbcooper import DbCooperdbc = DbCooper(engine)
```The `DbCooper` object contains two important things:
* Accessors to fetch specific tables.
* Functions for interacting with the underlying database.### Using table accessors
In the example below, we'll use the `"Lahman"."Salaries"` table as an example.
By default, dbcooper makes this accessible as `.lahman_salaries`.**Plain** `.lahman_salaries` prints out table and column info, including types and descriptions.
```{python}
# show table and column descriptions
dbc.lahman_salaries
```Note that sqlite doesn't support table and columnn descriptions, so these sections
are empty.**Calling** `.lahman_salaries()` fetches a lazy version of the data.
```{python}
dbc.lahman_salaries()
```Note that this data is a siuba `LazyTbl` object, which you can use to analyze the data.
```{python}
from siuba import _, countdbc.lahman_salaries() >> count(over_100k = _.salary > 100_000)
```### Using database functions
* `.list()`: Get a list of tables
* `.tbl()`: Access a table that can be worked with using `siuba`.
* `.query()`: Perform a SQL query and work with the result.
* `._engine`: Get the underlying sqlalchemy engine.For instance, we could start by finding the names of the tables in the Lahman database.
```{python}
dbc.list()
```We can access one of these tables with `dbc.tbl()`, then put it through any kind
of siuba operation.```{python}
dbc.tbl("Salaries")
``````{python}
from siuba import _, count
dbc.tbl("Salaries") >> count(_.yearID, sort=True)
```If you'd rather start from a SQL query, use the `.query()` method.
```{python}
dbc.query("""
SELECT
playerID,
sum(AB) as AB
FROM Batting
GROUP BY playerID
""")
```For anything else you might want to do, the sqlalchemy Engine object is available.
For example, the code below shows how you can set its `.echo` attribute, which
tells sqlalchemy to provide useful logs.```{python}
dbc._engine.echo = True
table_names = dbc.list()
```Note that the log messages above show that the `.list()` method executed two queries:
One to list tables in the "main" schema (which is empty), and one to list tables
in the "lahman" schema.## Advanced Configuration
> ⚠️: These behaviors are well tested, but dbcooper's internals and API may change.
dbcooper can be configured in three ways, each corresponding to a class interface:
* **TableFinder**: Which tables will be used by `dbcooper`.
* **AccessorBuilder**: How table names are turned into accessors.
* **DbcDocumentedTable**: The class that defines what an accessor will return.```{python}
from sqlalchemy import create_engine
from dbcooper.data import lahman_sqlite
from dbcooper import DbCooper, AccessorBuilderengine = create_engine("sqlite://")
lahman_sqlite(engine)
```### Excluding a schema
```{python}
from dbcooper import TableFinderfinder = TableFinder(exclude_schemas=["lahman"])
dbc_no_lahman = DbCooper(engine, table_finder=finder)
dbc_no_lahman.list()
```### Formatting table names
```{python}
from dbcooper import AccessorBuilder# omits schema, and keeps only table name
# e.g. `salaries`, rather than `lahman_salaries`
builder = AccessorBuilder(format_from_part="table")tbl_flat = DbCooper(engine, accessor_builder=builder)
tbl_flat.salaries()
```### Grouping tables by schema
```{python}
from dbcooper import AccessorHierarchyBuildertbl_nested = DbCooper(engine, accessor_builder=AccessorHierarchyBuilder())
# note the form: .
tbl_nested.lahman.salaries()
```### Don't show table documentation
```{python}
from dbcooper import DbcSimpleTabledbc_no_doc = DbCooper(engine, table_factory=DbcSimpleTable)
dbc_no_doc.lahman_salaries
```Note that sqlalchemy dialects like `snowflake-sqlalchemy` cannot look up things
like table and column descriptions as well as other dialects, so `DbcSimpleTable`
may be needed to connect to snowflake (see [this issue](https://github.com/snowflakedb/snowflake-sqlalchemy/issues/276)).## Developing
```shell
# install with development dependencies
pip install -e .[dev]# or install from requirements file
pip install -r requirements/dev.txt
```### Test
```shell
# run all tests, see pytest section of pyproject.toml
pytest# run specific backends
pytest -m 'not snowflake and not bigquery'# stop on first failure, drop into debugger
pytest -x --pdb
```### Release
```shell
# set version number
git tag v0.0.1# (optional) push to github
git push origin --tags# check version
python -m setuptools_scm
```