https://github.com/meeb/gateaux
Data structures and enforced types for FoundationDB
https://github.com/meeb/gateaux
Last synced: 10 months ago
JSON representation
Data structures and enforced types for FoundationDB
- Host: GitHub
- URL: https://github.com/meeb/gateaux
- Owner: meeb
- License: mit
- Created: 2020-07-08T10:00:22.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-12-22T05:19:38.000Z (about 4 years ago)
- Last Synced: 2025-03-20T12:51:37.102Z (10 months ago)
- Language: Python
- Size: 57.6 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# gateaux
Data structures and typing for FoundationDB.
"FoundationDB is a distributed database designed to handle large volumes of structured
data across clusters of commodity servers. It organizes data as an ordered key-value
store and employs ACID transactions for all operations." - taken from
https://github.com/apple/foundationdb/
FoundationDB has, by design, a bare minimum of features. It presents an interface to
applications which reads and writes binary data with a few basic helper layers.
FoundationDB has no native support for rich data types (for example `datetime` objects)
nor provides any extended data validation. FoundationDB is designed to have layers of
abstraction built on top of it to provide additional features.
The premise of `gateaux` is that where you currently have, in `fdb` library terms,
`tr[(some, data)] = (other, arbitrary, data)` of unstructured data is to enforce strict
standardisation of data in these tuples while allowing more complex types (datetime,
ipaddress etc.). In addition, the concept of structures allows for easier developer
comprehension of what data is being stored in what FoundationDB keyspace. `gateaux` has
a lot of code for not much of an interface, but it is designed to enforce structure and
types and therefore is over-engineered and over-tested by design so you don't have to
worry about your data structures in your upstream applications which use `gateaux`.
If you use `gateaux` your Python FoundationDB client code should look mostly the same,
you just can't mistakenly pack the wrong type or make mistakes before writing data. Such
additional validation is useful for larger codebases where you may be storing hundreds
of `key=value` formats into FoundationDB and keeping track of them can be challenging.
`gateaux` is a pure Python 3 (>=3.6) library which provides rich data type handling and
validation on top of the usual `pack()` and `unpack()` methods and extends the
`fdb.tuple` built-in layer. It is loosely modelled from the interfaces to relational
database object-relational mapper (RDBMS ORM) systems. Each `gateaux` structure
(comparable to a "model" in the ORM world) effectively formats one single key/value pair
at a time with more rigid validation than the `fdb` library provides out of the box.
`gateaux` does not handle FoundationDB connections for you, just the data parsing
part of your application. Effectively `gateaux` is just a data schema enforcing fancy
wrapper that sits on top of tuple packing and unpacking with some nice syntactic sugar.
`gateaux` does not abstract away any of the useful existing `fdb` keyspace interface.
While there is overhead in checking data and converting it between types, `gateaux` is
relatively performant as all it does is shuffle native Python data types about.
## Installation
`gateaux` itself only depends on `foundationdb` and `pytz`. You first need to install
the FoundationDB client libraries from:
https://www.foundationdb.org/download
and FoundationDB Python library from PyPI:
```bash
$ pip install foundationdb
```
Next, install `gateaux` from PyPI:
```bash
$ pip install gateaux
```
Python >= 3.6 is required due to the use of typing.
## Examples
* [General example usage](/examples/general.py)
* [FoundationDB class scheduling example rebuilt with gateaux](/examples/class_scheduling.py)
* [FoundationDB temperature readings example rebuilt with gateaux](/examples/temperature_readings.py)
## Enforced data format
`gateaux` enforces certain requirements. These are not suitable for every project so
check carefully and verify the library is appropriate for your application before you
use it:
1. All structures are in their own FoundationDB subspace, but it is up to you what
keyspace or subspace to use
2. Key tuple members are variable, a key of 3 elements can contain 1, 2 or 3 values,
this is to support prefixes and ranges for keys
3. Value tuple members are fixed, a value of 3 elements must always contain 3 values
3. Validation is strict, if you define a field as a StringField you cannot store bytes
in it etc.
4. While possible to support multiple data types, such as cast int(1) to str('1') if an
integer is provided to a StringField, by design typing is enforced and this will
raise an exception
5. You should not use direct binary data with FoundationDB while using `gateaux`, always
use tuples of other types, `tr[b'key'] = b'value` is incompatible with `gateaux`, but
`tr[(b'key',)] = (b'value',)` is compatible.
`gateaux` conforms to the same idea of `fdb.tuple` such that packing then unpacking a
tuple should always result in the same original tuple of data.
## Structures
There is only one base structure which is inherited to create your own structures.
Synopsis:
```python
import fdb
import gateaux
fdb.api_version(620)
db = fdb.open()
class SomeUserStructure(gateaux.Structure):
key = (gateaux.BinaryField(),)
value = (gateaux.BinaryField(),)
some_keyspace = fdb.Subspace(('some', 'subspace'))
some_structure_instance = SomeUserStructure(some_keyspace)
```
Structures have one required argument, a FoundationDB subspace. Structure instances
have the following interface for tuples:
* `structure.pack_key((...))` validates a tuple of data against the defined key
fields and returns bytes. The bytes are a FoundationDB packed tuple in the defined
directory.
* `structure.unpack_key(b'...')` unpacks FoundationDB bytes into a tuple and then
validates the data against the defined key fields returning the appropriate data type
for the field.
* `structure.pack_value((...))` validates a tuple of data against the defined value
fields and returns bytes. The bytes are a FoundationDB packed tuple in the defined
directory.
* `structure.unpack_value(b'...')` unpacks FoundationDB bytes into a tuple and then
validates the data against the defined value fields returning the appropriate data
type for the field.
And the following interface for dicts:
* `structure.pack_key_dict({...})` validates a dict of data against the defined key
fields and returns bytes. The bytes are a FoundationDB packed tuple in the defined
directory. To use key dicts you must have given all of your key fields a name.
* `structure.unpack_key_dict(b'...')` unpacks FoundationDB bytes into a dict and then
validates the data against the defined key fields returning the appropriate data type
for the field. To use key dicts you must have given all of your key fields a name.
* `structure.pack_value_dict((...))` validates a dict of data against the defined value
fields and returns bytes. The bytes are a FoundationDB packed tuple in the defined
directory. To use value dicts you must have given all of your value fields a name.
* `structure.unpack_value_dict(b'...')` unpacks FoundationDB bytes into a dict and then
validates the data against the defined value fields returning the appropriate data
type for the field. To use value dicts you must have given all of your value fields a
name.
And the following properties:
* `structure.description` a property which returns a `dict` describing the model,
including the name of the structure and any doc string as well as lists of
descriptions for each field in the key and value. You can use this to programmatically
inspect a structure in the future and is useful if you have many structures.
## Fields
All fields support the following arguments:
* `name=string` If set it defines a name stored for the field.
* `help_text=string` If set, help defines some optional help text to describe the data
stored in the value.
* `null=boolean` If set to True then the field can have a `None` value. Defaults to
False.
* `default=value` If set, defines a default for a field. The type must match the
required type for the field. A default is only used if `null=True` and `None` is
provided to the field.
Other field types may support more arguments.
### BinaryField
Stores bytes. Optional arguments:
* `max_length=int` If set defines the maximum number of bytes the field will store.
Accepted type: `bytes`
### IntegerField
Stores integers. Optional arguments:
* `min_value=int` If set defines the minimum number the field will accept
* `max_value=int` If set defines the maximum number the field will accept
Accepted type: `int`
### FloatField
Stores floats. Optional arguments:
* `min_value=float` If set defines the minimum number the field will accept
* `max_value=float` If set defines the maximum number the field will accept
Accepted type: `float`
### BooleanField
Stores booleans.
Accepted type: `bool`
### StringField
Stores strings. Optional arguments:
* `max_length=int` If set defines the maximum length of the field will accept
Accepted type: `str`
### DateTimeField
Stores datetime instances. Internally stored as a UNIX timestamp as floats in UTC.
Accepted type: `datetime.datetime`
Input `datetime.datetime` will be normalised to UTC and returned in UTC when read. You
should account for this in your application. Storing a `datetime.datetime` in any other
timezone will convert it to UTC when read. If the provided `datetime.datetime` instance
has no timezone info it will be assumed to be UTC.
### IPv4AddressField
Stores IPv4 addresses. Internally stored as 4 bytes.
Accepted type: `ipaddress.IPv4Address`
### IPv4NetworkField
Stores IPv4 networks. Internally stored as 5 bytes (address + prefix length).
Accepted type: `ipaddress.IPv4Network`
### IPv6AddressField
Stores IPv6 addresses. Internally stored as 16 bytes.
Accepted type: `ipaddress.IPv6Address`
### IPv6NetworkField
Stores IPv4 networks. Internally stored as 17 bytes (address + prefix length).
Accepted type: `ipaddress.IPv6Network`
### EnumField
Stores Enums. Internally stored as an integer that maps to a specified value. Required
arguments:
* `members=tuple` A mandatory tuple of ints to use as Enum members.
Example:
```python
MEMBER_A = 0
MEMBER_B = 1
MEMBERS = (
MEMBER_A,
MEMBER_B
)
EnumField(members=MEMBERS)
```
Accepted type: `int`
### UUIDField
Stores UUID instances. Internally stored as 16 bytes.
Accepted type: `uuid.UUID`
## Tests
There is a pretty comprehensive test suite. As `gateaux` is designed to pack and unpack
important data it has good coverage to make sure it's behaving as expected. You can run
it by cloning this repository then executing:
```bash
$ make test
```
The tests perform type checking and require "mypy" from http://mypy-lang.org/ to be
globally installed. E.g. `apt install mypy`.
## Contributing
All properly formatted and sensible pull requests, issues and comments are welcome.