https://github.com/bertrandchenal/tanker
Tanker is a Python database library targeting analytic operations
https://github.com/bertrandchenal/tanker
database postgresql python sqlite
Last synced: 6 months ago
JSON representation
Tanker is a Python database library targeting analytic operations
- Host: GitHub
- URL: https://github.com/bertrandchenal/tanker
- Owner: bertrandchenal
- License: isc
- Created: 2019-10-22T09:06:17.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2021-11-10T10:54:54.000Z (over 4 years ago)
- Last Synced: 2025-11-01T10:19:19.962Z (8 months ago)
- Topics: database, postgresql, python, sqlite
- Language: Python
- Homepage:
- Size: 4.33 MB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Tanker
Tanker is a Python database library targeting analytic operations but
it also fits most transactional processing.
As its core it's mainly a query builder that simplify greatly join
operations. It comes with a way to automatically create the database
tables based on your schema definition and it can also introspect
existing db and infer the needed metadata.
Currently Postgresql and Sqlite are supported and the API is made to
seamlessly integrate pandas DataFrames.
## Licence
Tanker is available under the ISC Licence, see LICENCE file at the
root of the repository.
## Main features
### Schema definition and database connection
The file `schema.yaml` defines the database structure: table, columns
(and their types) and key.
``` yaml
- table: team
columns:
name: varchar
country: m2o country.id
key:
- name
- country
- table: country
columns:
name: varchar
key:
- name
```
The code here-under create the config dictionary and use it to connect
to the database and creates the tables.
``` python
from tanker import connect, create_tables, View, yaml_load
cfg = {
'db_uri': 'sqlite:///test.db',
'schema': yaml_load(open('schema.yaml')),
}
with connect(cfg):
create_tables()
```
Tanker automatically add an `id` column on each table, to allow to
define foreign keys. For example, in the yaml definition, `country:
m2o country.id` means that a many-to-one relation will be created
between the tables team and country. When the team table will be
created this will generate the following column definition:
"country" INTEGER REFERENCES "country" (id) ON DELETE CASCADE
If not specified, `sqlite:///:memory:` will be used as `db_uri`. To
use Postgresql, the uri should looks like
`postgresql://login:passwd@hostname/dbname` (and you can choose the
postgres schema to use by appending `#shema_name`to the uri)
Note that every database interaction must happen inside the `with
connect(cfg)` block.
### Read & write
Tanker usage is centered around the `View` object, it is used to
define a mapping between the relational world and Python. For example,
to write and read countries, we define a view based on the country
table:
``` python
country_view = View(
'country', # The base table
['name'] # The fields we want to map
)
```
So now we can write to the database:
``` python
countries = [['Belgium'], ['France']]
country_view.write(countries)
```
And read it back.
``` python
countries_copy = country_view.read().all()
```
And `countries_copy` should be identical to `countries`. As `.read()`
returns the database cursor, the `.all()` allows to fetch all the
records. Instead of `.all()` one can use `.df()` to receive a pandas
DataFrame.
### Key role
As you can see in the database definition, each table comes with a `key`
attribute. This attribute contains the list of columns that form a
[natural key](https://en.wikipedia.org/wiki/Natural_key).
This key is required by design in Tanker, its main role is to
allow Tanker to know what to do with each record when `View.write` is
called. Thanks to the key, we know if the record is already in the
database (and in this case will generate an `UPDATE` statement) or if
the record is new (and use an `INSERT` query).
It's especially handy when dealing for example with data coming from a
website scraper or from an spreadsheet, where a technical id (like an
integer or a uuid) is not always available.
To avoid to launch one query per record and suffer from network
latencies, what Tanker do to speed up writes is to create a temporary
table, insert all the record as one batch and then join this temporary
table with the actual one to know which record to insert and which to
update.
### Foreign key resolution
To populate the `team` table we have to provide a team name and a
country. We can do it like this:
``` python
team_view = View('team, ['name', 'country'])
team_view.write([['Red', 1]])
```
But it's more convenient to use the country name instead of it's id:
``` python
teams = [
['Blue', 'Belgium'],
['Red', 'Belgium'],
['Blue', 'France'],
]
team_view = View('team, ['name', 'country.name'])
team_view.write(teams)
```
You can see that we changed `country` into `country.name` in the view,
which means that the use the `name` column to identify the country
(which is conveniently defined as the key in the table
definition).
We can go further and use more than one dot and let Tanker resolve
foreign key for us. Let's say we want to add a member table to our
database, we append the following piece of yaml to our schema file
``` yaml
- table: member
columns:
name: varchar
registration_code: varchar
team: m2o team.id
key:
- registration_code
```
And re-run the `create_tables()` as above. Now we can do:
rows = View('member', ['name', team.country.name]).read()
Here, two join queries will be automatically generated, one between
`member` and `team` and one between `team` and `country`.
To add a member we have to link it to a team, whose key is composed
of both the name and the country column (so we allow two teams with the
same name in different countries):
``` python
members = [
['Bob', 'Belgium', 'Blue', '001'],
['Alice', 'Belgium', 'Red', '002'],
['Trudy', 'France', 'Blue', '003'],
]
member_view = View('member', ['name', 'team.country.name', 'team.name',
'registration_code'],
])
member_view.write(members)
```
Tanker will be able to identify for each member the correct team based
on both country name and team name.
### Filters
The read method accept a `filters` argument it can be a string or a
list of strings. Filter strings use
[s-expression](https://en.wikipedia.org/wiki/S-expression)
notation. So for example to filter a country by name you can do:
``` python
filters = '(= name "Belgium")'
country_view.read(filters)
```
or to get `registration_code` above a given value:
``` python
member_view.read('(> registration_code "002")')
```
You can also combine those filters and use the dot notation:
filters = '(or ((> registration_code "002") (= team.country.name "Belgium")))'
member_view.read(filters).read()
The `filters` argument can also be a list, in this case all items are
regrouped in a conjunction, equivalent to `(and item1 item2 ...)`.
### Query arguments
To facilitate the building of queries and more importantly to prevent
sql injections you can use arguments. They use the syntax of Python
own [string format method](https://docs.python.org/2/library/stdtypes.html#str.format),
and will make use of the DB-API's parameter substitution (see for
example [the sqlite documentation](https://docs.python.org/2/library/sqlite3.html)):
``` python
cond = '(= name {name})'
rows = team_view.read(cond).args(name='Blue')
```
You can also pass list values, they will be automatically
expanded. And you can use the dot notation to reach a given parameter
in the object passed as argument:
``` python
cond = '(or (in name {names}) (= registration_code {data.code}))'
rows = member_view.read(cond).args(names=['Alice', 'Bob'], data=my_object)
```
The dot notation also supports dictionnaries, so the above example
whould work with `data={'code': '001'}`. The query arguments can also
refer to values from the configuration (which can be reach from the
`ctx` object), like:
``` python
ctx.cfg['default_team'] = 'Red'
cond = '(in name {default_team})'
rows = view.read(cond)
```
Finally, arguments can be a list instead of a dict and can be passed to the `read` method, so:
``` python
cond = '(in name {names})'
rows = team_view.read(cond).args(names=['Blue', 'Red'])
```
is equivalent to
``` python
cond = '(in name {} {})'
rows = team_view.read(cond).args('Blue', 'Red')
```
and is equivalent to
``` python
cond = '(in name {} {})'
rows = team_view.read(cond, args=['Blue', 'Red'])
```
### Pandas Dataframes
Instead of passing a list of list we can use a dataframe, and use a
dictionary to map dataframe columns to database columns.
``` python
df = DataFrame({
'Team': ['Blue', 'Red'],
'Country': ['France', 'Belgium']
})
view = View('team', {
'Team': 'name',
'Country': 'country.name',
})
view.write(data)
df_copy = view.read().df()
```
## ACL
The ACL system of Tanker allows to systemically filter access to the
rows of any table. We can define filters for reading data or writing
it. To enable it, you can add it to the `cfg` parameter of `connect`,
but you can also change it later:
``` python
cfg['acl-read'] = {'team': "(= team.country.name 'Belgium')"}
with connect(cfg):
teams = View('team').read().all() # No belgian team will be returned
cfg['acl-read'] = {} # we reset the acl
teams = View('team').read().all() # Returns everything again
```
As you can see it's a simple dict whose keys are table names and
values are Tanker filters. Similarly, you can add an `acl-write`
dictionary to `cfg`:
``` python
cfg['acl-read'] = {'team': "(= team.country.name 'Belgium')"}
View('team').write(teams)
```
## Documentation TODO
- Deletion (by data, by filter)
- Aliases
## Roadmap
Some ideas, in no particular order:
- Add a view.insert method that bypass tmp table and write directly
to the actual table
- Support for version column (probably a write timestamp)
- Add support for other 'ON CONFLICT' action (like incrementing a
version column, or appening to an array)
- Support for table constraints
- Allow to execute complete queries with s-expressions (select,
update, insert and delete).