Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mara/mara-db
Lightweight configuration and access to multiple databases in a single project
https://github.com/mara/mara-db
backend flask mara sqlalchemy
Last synced: 2 months ago
JSON representation
Lightweight configuration and access to multiple databases in a single project
- Host: GitHub
- URL: https://github.com/mara/mara-db
- Owner: mara
- License: mit
- Created: 2017-03-08T20:20:43.000Z (almost 8 years ago)
- Default Branch: main
- Last Pushed: 2023-12-15T14:26:00.000Z (about 1 year ago)
- Last Synced: 2024-11-04T07:36:05.543Z (2 months ago)
- Topics: backend, flask, mara, sqlalchemy
- Language: Python
- Homepage:
- Size: 759 KB
- Stars: 38
- Watchers: 6
- Forks: 17
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome-starred - mara/mara-db - Lightweight configuration and access to multiple databases in a single project (backend)
README
# Mara DB
[![Build Status](https://github.com/mara/mara-db/actions/workflows/build.yml/badge.svg)](https://github.com/mara/mara-db/actions/workflows/build.yml)
[![PyPI - License](https://img.shields.io/pypi/l/mara-db.svg)](https://github.com/mara/mara-db/blob/main/LICENSE)
[![PyPI version](https://badge.fury.io/py/mara-db.svg)](https://badge.fury.io/py/mara-db)
[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://communityinviter.com/apps/mara-users/public-invite)Mini package for configuring and accessing multiple databases in a single project. Decouples the use of databases and their configuration by using "aliases" for databases.
The file [mara_db/dbs.py](https://github.com/mara/mara-db/blob/main/mara_db/dbs.py) contains abstract database configurations for PostgreSQL, Mysql, SQL Server, Oracle, SQLite and Big Query. The database connections of a project are configured by overwriting the `databases` function in [mara_db/config.py](https://github.com/mara/mara-db/blob/main/mara_db/config.py):
```python
import mara_db.config
import mara_db.dbs## configure database connections for different aliases
mara_db.config.databases = lambda: {
'mara': mara_db.dbs.PostgreSQLDB(host='localhost', user='root', database='mara'),
'dwh': mara_db.dbs.PostgreSQLDB(database='dwh'),
'source-1': mara_db.dbs.MysqlDB(host='some-localhost', database='my_app', user='dwh'),
'source-2': mara_db.dbs.SQLServerDB(user='dwh_read', password='123abc', database='db1', host='some-sql-server')
}## access individual database configurations with `dbs.db`:
print(mara_db.dbs.db('mara'))
# ->
```
## Visualization of (PostgreSQL, MySQL, SQL Server) database schemas
[mara_db/views.py](https://github.com/mara/mara-db/blob/main/mara_db/views.py) contains a schema visualization for all configured databases using graphviz (currently PostgreSQL, Mysql and SQL Server only). It basically show tables of selected schemas together with the foreign key relations between them.
![Schema visualization](https://github.com/mara/mara-db/blob/main/docs/_static/schema-visualization.png)
For finding missing foreign key constraints, columns that follow a specific naming pattern (configurable via `config.schema_ui_foreign_key_column_regex`, default `*_fk`) and that are not part of foreign key constraints are drawn in pink.
## Fast batch processing: Accessing databases with shell commands
The file [mara_db/shell.py](https://github.com/mara/mara-db/blob/main/mara_db/shell.py) contains functions that create commands for accessing databases via their command line clients.
For example, the `query_command` function creates a shell command that can receive an SQL query from stdin and execute it:
```python
import mara_db.shellprint(mara_db.shell.query_command('source-1'))
# -> mysql --default-character-set=utf8mb4 --user=dwh --host=some-localhost my_appprint(mara_db.shell.query_command('dwh', timezone='Europe/Lisbon', echo_queries=False))
# -> PGTZ=Europe/Lisbon PGOPTIONS=--client-min-messages=warning psql --no-psqlrc --set ON_ERROR_STOP=on dwh
```The function `copy_to_stdout_command` creates a shell command that receives a query on stdin and writes the result to stdout in tabular form:
```python
print(mara_db.shell.copy_to_stdout_command('source-1'))
# -> mysql --default-character-set=utf8mb4 --user=dwh --host=some-localhost my_app --skip-column-names
```Similarly, `copy_from_stdin_command` creates a client command that receives tabular data from stdin and and writes it to a target table:
```python
print(mara_db.shell.copy_from_stdin_command('dwh', target_table='some_table', delimiter_char=';'))
# -> PGTZ=Europe/Berlin PGOPTIONS=--client-min-messages=warning psql --echo-all --no-psqlrc --set ON_ERROR_STOP=on dwh \
# --command="COPY some_table FROM STDIN WITH DELIMITER AS ';'"
```Finally, `copy_command` creates a shell command that receives a sql query from stdin, executes the query in `source_db` and then writes the result of to `target_table` in `target_db`:
```python
print(mara_db.shell.copy_command('source-2', 'dwh', target_table='some_table'))
# -> sed 's/\\\\$/\$/g;s/\$/\\\\$/g' \
# | sqsh -U dwh_read -P 123abc -S some-sql-server -D db1 -m csv \
# | PGTZ=Europe/Berlin PGOPTIONS=--client-min-messages=warning psql --echo-all --no-psqlrc --set ON_ERROR_STOP=on dwh \
# --command = "COPY some_table FROM STDIN WITH CSV HEADER"
```
The following **command line clients** are used to access the various databases:
| Database | Client binary | Comments |
| --- | --- | --- |
| Postgresql / Redshift | `psql` | Included in standard distributions. |
| MariaDB / Mysql | `mysql` | Included in standard distributions. |
| SQL Server | `sqsh`
- or -
`sqlcmd` | **sqsh**: From [https://sourceforge.net/projects/sqsh/](https://sourceforge.net/projects/sqsh/), usually messy to get working. On ubuntu, use [http://ppa.launchpad.net/jasc/sqsh/ubuntu/](http://ppa.launchpad.net/jasc/sqsh/ubuntu/) backport. On Mac, try the homebrew version or install from source.
**sqlcmd**: Official Microsoft Utility for SQL Server. See [sqlcmd Utility](https://docs.microsoft.com/en-us/sql/tools/sqlcmd-utility) |
| Oracle | `sqlplus64` | See the [Oracle Instant Client](https://www.oracle.com/technetwork/database/database-technologies/instant-client/overview/index.html) homepage for details. On Mac, follow [these instructions](https://vanwollingen.nl/install-oracle-instant-client-and-sqlplus-using-homebrew-a233ce224bf). Then ` sudo ln -s /usr/local/bin/sqlplus /usr/local/bin/sqlplus64` to make the binary accessible as `sqlplus64`. |
| SQLite | `sqlite3` | Available in standard distributions. Version >3.20.x required (not the case on Ubuntu 14.04). |
| Big Query | `bq` | See the [Google Cloud SDK](https://cloud.google.com/sdk/docs/quickstarts) page for details. |
| Snowflake | `snowsql` | See [SnowSQL (CLI Client)](https://docs.snowflake.com/en/user-guide/snowsql.html) |
| Databricks | `dbsqlcli` | Included when using package extra `databricks` via package [databricks-sql-cli](https://pypi.org/project/databricks-sql-cli/). See [Databricks SQL CLI](https://docs.databricks.com/dev-tools/databricks-sql-cli.html#) |
## Make it so! Auto-migration of SQLAlchemy models
[Alembic has a feature](http://alembic.zzzcomputing.com/en/latest/autogenerate.html) that can create a diff between the state of a database and the ORM models of an application. This feature is used in [mara_db/auto_migrate.py](https://github.com/mara/mara-db/blob/main/mara_db/auto_migrate.py) to automatically perform all necessary database transformations, without intermediate migration files:
```python
# define a model / table
class MyTable(sqlalchemy.ext.declarative.declarative_base()):
__tablename__ = 'my_table'
my_table_id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
column_1 = sqlalchemy.Column(sqlalchemy.TEXT, nullable=False, index=True)db = mara_db.dbs.SQLiteDB(file_name='/tmp/test.sqlite')
# create database and table
mara_db.auto_migration.auto_migrate(engine=mara_db.auto_migration.engine(db), models=[MyTable])
# ->
# Created database "sqlite:////tmp/test.sqlite"
#
# CREATE TABLE my_table (
# my_table_id SERIAL NOT NULL,
# column_1 TEXT NOT NULL,
# PRIMARY KEY (my_table_id)
# );
#
# CREATE INDEX ix_my_table_column_1 ON my_table (column_1);
```When the model is changed later, then `auto_migrate` creates a diff against the existing database and applies it:
```python
# remove index and add another column
class MyTable(sqlalchemy.ext.declarative.declarative_base()):
__tablename__ = 'my_table'
my_table_id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
column_1 = sqlalchemy.Column(sqlalchemy.TEXT, nullable=False)
column_2 = sqlalchemy.Column(sqlalchemy.Integer)auto_migrate(engine=engine(db), models=[MyTable])
# ->
# ALTER TABLE my_table ADD COLUMN column_2 INTEGER;
#
# DROP INDEX ix_my_table_text_column_1;
```**Use with care**! The are lot of changes [that alembic auto-generate can not detect](http://alembic.zzzcomputing.com/en/latest/autogenerate.html#what-does-autogenerate-detect-and-what-does-it-not-detect). We recommend testing each aut-migration on a staging system first before deploying to production. Sometimes manual migration scripts will be necessary.
## Installation
```bash
pip install mara-db
```or
```bash
pip install git+https://github.com/mara/mara-db.git
```### Optional: Installation of requirements for SQL Server
For usage with SQL Server, the python module pyodbc and a odbc driver (e.g. Microsoft ODBC Driver 17 for SQL Server) is required which is not included in the general requirement.
To see how to install pyodbc, take a look into [this install guide](https://github.com/mkleehammer/pyodbc/wiki/Install).
To see how to install ODBC 17, take a look into [Installing the Microsoft ODBC Driver for SQL Server on Linux and macOS](https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver15).On Linux, you most likely will have to deal with an SSL issue, see [this issue](https://github.com/microsoft/msphpsql/issues/1023). A quick, dirty option in a test/development environment could be to [disable the requirement for TLS 1.2](https://github.com/microsoft/msphpsql/issues/1023#issuecomment-523214695).
### Optional: Installation of requirements for BigQuery
For usage with BigQuery, the official `bq` and `gcloud` clients are required.
See the [Google Cloud SDK](https://cloud.google.com/sdk/docs/quickstarts) page for installation details.Enabling the BigQuery API and Service account JSON credentials are also required as listed
in the official documentation [here](https://cloud.google.com/bigquery/docs/quickstarts/quickstart-client-libraries#before-you-begin).One time authentication of the service-account used:
```cmd
gcloud auth activate-service-account --key-file='path-to/service-account.json'
```Optionally, for loading data from files into BigQuery, the `gcloud_gcs_bucket_name` can be specified in the database initialization.
This will use the Google Cloud Storage bucket specified as cache for loading data and over-coming potential limitations.
For more see [loading-data](https://cloud.google.com/bigquery/docs/bq-command-line-tool#loading_data).
By default, files will directly loaded locally as described in [loading-local-data](https://cloud.google.com/bigquery/docs/loading-data-local#loading_data_from_a_local_data_source).A BigQuery context with a python cursor is also available on demand for easy access to BigQuery databases.
In order to use, install the official Google python client library: [google-cloud-bigquery](https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python).## Links
* Documentation: https://mara-db.readthedocs.io/
* Changes: https://mara-db.readthedocs.io/en/latest/changes.html
* PyPI Releases: https://pypi.org/project/mara-db/
* Source Code: https://github.com/mara/mara-db
* Issue Tracker: https://github.com/mara/mara-db/issues