Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/malcolmgreaves/pywise

Robust serialization support for NamedTuple & @dataclass data types.
https://github.com/malcolmgreaves/pywise

Last synced: 14 days ago
JSON representation

Robust serialization support for NamedTuple & @dataclass data types.

Host: GitHub
URL: https://github.com/malcolmgreaves/pywise
Owner: malcolmgreaves
License: lgpl-3.0
Created: 2020-02-20T20:01:24.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2024-05-29T04:18:11.000Z (7 months ago)
Last Synced: 2024-05-29T15:12:28.202Z (7 months ago)
Language: Python
Homepage:
Size: 198 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # `pywise`

[![PyPI version](https://badge.fury.io/py/pywise.svg)](https://badge.fury.io/py/pywise) [![CircleCI](https://circleci.com/gh/malcolmgreaves/pywise/tree/main.svg?style=svg)](https://circleci.com/gh/malcolmgreaves/pywise/tree/main) [![Coverage Status](https://coveralls.io/repos/github/malcolmgreaves/pywise/badge.svg?branch=main)](https://coveralls.io/github/malcolmgreaves/pywise?branch=main)

Contains functions that provide general utility and build upon the Python 3 standard library. It has no external dependencies.

  - `serialization`: serialization & deserialization for `NamedTuple`-deriving & `@dataclass` decorated classes

  - `archives`: uncompress tar archives

  - `common`: utilities

  - `schema`: obtain a `dict`-like structure describing the fields & types for any serialzable type (helpful to view as JSON)

This project's most notable functionality are the `serialize` and `deserialize` funtions of `core_utils.serialization`.

Take a look at the end of this document for example use.

## Development Setup

This project uses [`poetry`](https://python-poetry.org/) for virtualenv and dependency management. We recommend using [`brew`](https://brew.sh/) to install `poetry` system-wide.

To install the project's dependencies, perform:

```

poetry install

```

Every command must be run within the `poetry`-managed environment.

For instance, to open a Python shell, you would execute:

```

poetry run python

```

Alternatively, you may activate the environment by performing `poetry shell` and directly invoke Python programs.

### Development Practices

Install pre-commit `git` hooks using `pre-commit install`. Hooks are defined in the `.pre-commit-config.yaml` file.

CI enforces linting using all pre-commit hooks.

NOTE: Dependencies in hooks **MUST** be kept in-sync with the 

      `dev-dependencies` section in `pyproject.toml` for `poetry.

#### Testing

To run tests, execute:

```

poetry run pytest -v

```

To run tests against all supported environments, use [`tox`](https://tox.readthedocs.io/en/latest/):

```

poetry run tox -p

```

NOTE: To run `tox`, you must have all necessary Python interpreters available.

      We recommend using [`pyenv`](https://github.com/pyenv/pyenv) to manage your Python versions.

#### Dev Tools

This project uses `ruff` for code formatting and  linting. Static type checking is enforced using `mypy`.

Use the following commands to ensure code quality:

```

# code format and lint: applies fixes automatically in-place if possible

ruff format .

ruff check --fix .

# typechecks

mypy --ignore-missing-imports --follow-imports=silent --show-column-numbers --warn-unreachable --install-types --non-interactive --check-untyped-defs .

```

## Documentation via Examples

#### Nested @dataclass and NamedTuple

Lets say you have an address book that you want to write to and from JSON.

We'll define our data types for our `AddressBook`:

```python

from typing import Optional, Union, Sequence

from dataclasses import dataclass

from enum import Enum, auto

@dataclass(frozen=True)

class Name:

    first: str

    last: str

    middle: Optional[str] = None

class PhoneNumber(NamedTuple):

    area_code: int

    number: int

    extension: Optional[int] = None

@dataclass(frozen=True)

class EmailAddress:

    name: str

    domain: str

class ContactType(Enum):

    personal, professional = auto(), auto()

class Emergency(NamedTuple):

    full_name: str

    contact: Union[PhoneNumber, EmailAddress]

@dataclass(frozen=True)

class Entry:

    name: Name

    number: PhoneNumber

    email: EmailAddress

    contact_type: ContactType

    emergency_contact: Emergency

@dataclass(frozen=True)

class AddressBook:

    entries: Sequence[Entry]

```

For illustration, let's consider the following instantiated `AddressBook`:

```python

ab = AddressBook([

    Entry(Name('Malcolm', 'Greaves', middle='W'), 

          PhoneNumber(510,3452113),

          EmailAddress('malcolm','world.com'),

          contact_type=ContactType.professional,

          emergency_contact=Emergency("Superman", PhoneNumber(262,1249865,extension=1))

    ),

])

```

We can convert our `AddressBook` data type into a JSON-formatted string using `serialize`:

```python

from core_utils.serialization import serialize

import json

s = serialize(ab)

j = json.dumps(s, indent=2)

print(j)

```

And we can easily convert the JSON string back into a new instanitated `AddressBook` using `deserialize`:

```python

from core_utils.serialization import deserialize

d = json.loads(j)

new_ab = deserialize(AddressBook, d)

print(ab == new_ab)

# NOTE: The @dataclass(frozen=True) is only needed to make this equality work.

#       Any @dataclass decorated type is serializable. 

```

Note that the `deserialize` function needs the type to deserialize the data into. The deserizliation

type-matching is _structural_: it will work so long as the data type's structure (of field names and

associated types) is compatible with the supplied data.

#### Custom Serialization

In the event that one desires to use `serialize` and `deserialize` with data types from third-party libraries (e.g. `numpy` arrays) or custom-defined `class`es that are not decorated with `@dataclass` or derive from `NamedTuple`, one may supply a `CustomFormat`.

`CustomFormat` is a mapping that associates precise types with custom serialization functions. When supplied to `serialize`, the values in the mapping accept an instance of the exact type and produces a serializable representation. In the `deserialize` function, they convert such a serialized representation into a bonafide instance of the type.

To illustrate their use, we'll deine `CustomFormat` `dict`s that allow us to serialize `numpy` multi-dimensional arrays:

```python

import numpy as np

from core_utils.serialization import *

custom_serialization: CustomFormat = {

    np.ndarray: lambda arr: arr.tolist()

}

custom_deserialization: CustomFormat = {

    np.ndarray: lambda lst: np.array(lst)

}

```

Now, we may supply `custom_{serialization,deserialization}` to our functions. We'll use them to perform a "round-trip" serialization of a four-dimensional array of floating point numbers to and from a JSON-formatted `str`:

```python

import json

v_original = np.random.random((1,2,3,4))

s = serialize(v_original, custom=custom_serialization)

j = json.dumps(s)

d = json.loads(j)

v_deser = deserialize(np.ndarray, d, custom=custom_deserialization)

print((v_original == v_deser).all())

```

It's important to note that, when supplying a `CustomFormat` the serialization functions take priority over the default behavior (except for `Any`, as it is _always_ considered a pass-through). Moreover, types must match **exactly** to the keys in the mapping. Thus, if using a generic type, you must supply separate key-value entires for each distinct type parameterization.