https://github.com/solidsnack/pg-sql-variants

Variants types for PostgreSQL
https://github.com/solidsnack/pg-sql-variants
Last synced: 6 months ago
JSON representation
Variants types for PostgreSQL
Host: GitHub
URL: https://github.com/solidsnack/pg-sql-variants
Owner: solidsnack
Created: 2016-03-14T08:37:51.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2020-04-21T23:51:37.000Z (about 6 years ago)
Last Synced: 2025-03-05T17:08:17.115Z (over 1 year ago)
Language: PLpgSQL
Size: 10.7 KB
Stars: 30
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          How To Use This Repo

====================

This repo provides utilities to for modeling variant types in Postgres and

provides a demo schema, as well. To try it out, first use Git to obtain the

relevant code:

```bash

:;  git clone git@github.com:solidsnack/pg-sql-variants.git

:;  cd pg-sql-variants/

:;  git submodule update --init --recursive

```

Then load the utilities and the sample schema in Postgres:

```sql

:;  psql

Line style is unicode.

Expanded display is used automatically.

Null display is "\N".

Timing is on.

psql (12.1)

Type "help" for help.

--# thelyfsoshort@[local]/~

\i init.psql 

BEGIN

...

COMMIT

...

BEGIN

...

COMMIT

...

BEGIN

...

COMMIT

...

```

The sample schema helps us to demonstrate a simple polymorhpic datatype: an

`animal` type with concrete `cat`, `dog` and `walrus` subtypes.

```sql

--# thelyfsoshort@[local]/~

SELECT tablename FROM pg_tables WHERE schemaname = 'inetorg';

 tablename

───────────

 cat

 walrus

 dog

 animal

(4 rows)

```

Let's setup the variant relationship between the types in the `inetorg`

namespace with the `variant()` function from the `variants` namespace:

```sql

--# thelyfsoshort@[local]/~

SET search_path TO inetorg, variants, "$user", public;

--# thelyfsoshort@[local]/~

SELECT * FROM variant('animal', 'cat');

SELECT * FROM variant('animal', 'walrus');

SELECT * FROM variant('animal', 'dog');

```

We can see that there are no `animal`s and there are no `cat`s:

```sql

--# thelyfsoshort@[local]/~

SELECT * FROM animal;

 ident 

───────

(0 rows)

--# thelyfsoshort@[local]/~

SELECT * FROM cat;

 license │ responds_to │ doglike

─────────┼─────────────┼─────────

(0 rows)

```

The `variants.variant()` function is basically a SQL macro; it sets up several

triggers every time it is called. Let's add a `cat`:

```sql

--# thelyfsoshort@[local]/~ 

INSERT INTO cat VALUES ('00000000-0000-0000-0000-000000000001', 'felix', FALSE);

INSERT 0 1

```

The triggers ensure that records are added to the `animal` table, as well.

```sql

--# thelyfsoshort@[local]/~

SELECT * FROM cat;

               license                │ responds_to │ doglike

──────────────────────────────────────┼─────────────┼─────────

 00000000-0000-0000-0000-000000000001 │ felix       │ f

(1 row)

--# thelyfsoshort@[local]/~

SELECT * FROM animal;

                ident                 

──────────────────────────────────────

 00000000-0000-0000-0000-000000000001

(1 row)

```

In addition to the triggers, `variants.variant()` also maintains a join table,

with one column for each variant type. In this case, the join table is named

`animal*`:

```sql

--# thelyfsoshort@[local]/~

SELECT * FROM "animal*";

Time: 0.285 ms

─[ RECORD 1 ]──────────────────────────────────────────

ident  │ 00000000-0000-0000-0000-000000000001

type   │ cat

cat    │ (00000000-0000-0000-0000-000000000001,felix,f)

walrus │ \N

dog    │ \N

(1 row)

```

The join table illustrates a cool Postgres features: columns with row types.

What good is a `cat`? Better delete while we're not sure:

```sql

--# thelyfsoshort@[local]/~

DELETE FROM cat;

DELETE 1

```

No more `cat`s, no more `animal*`s:

```sql

--# thelyfsoshort@[local]/~

SELECT * FROM cat;

 license │ responds_to │ doglike

─────────┼─────────────┼─────────

(0 rows)

--# thelyfsoshort@[local]/~

SELECT * FROM "animal*";

 ident │ type │ cat │ walrus │ dog

───────┼──────┼─────┼────────┼─────

(0 rows)

```

Our Approach to Variant Types in Postgres

=========================================

Typed variants, case classes, tagged unions, algebraic data types or

just [enums]: variant types are a feature common to many programming languages

but are an awkward fit for SQL.

[enums]: https://doc.rust-lang.org/book/enums.html

The fundamental difficulty is that foreign keys can reference columns of only

one other table. By combining `VIEW`s, triggers and Postgres's JSON data-type,

we can group related types like `cat` and `walrus` under a tagged union like

`animal`, allowing other tables to create foreign keys that reference

`animal`.

```sql

CREATE TABLE cat (

  license       uuid PRIMARY KEY,

  responds_to   text NOT NULL,

  doglike       boolean DEFAULT TRUE

);

CREATE TABLE walrus (

  registration  uuid PRIMARY KEY,

  nickname      text,

  size          text NOT NULL DEFAULT 'big' CHECK (size IN ('small', 'big')),

  haz_bucket    boolean NOT NULL DEFAULT FALSE

);

CREATE TABLE animal (

  ident         uuid PRIMARY KEY

);

```

The process for forming the foreign key and triggers is completely formulaic

and we capture it in a stored procedure, `variant` (in `variants.sql`) that

allows one to put `cat` and `walrus` together under `animal`:

```sql

SELECT * FROM variant('animal', 'cat');

SELECT * FROM variant('animal', 'walrus');

```

Changes to keys are propagated bidirectionally between `animal` and its

variants. A `DELETE` against a cat's UUID in `animal` will remove the row from

`cat`; and a delete against `cat` will remove the row from `animal`.

Why data in your database is not like data in your app

------------------------------------------------------

Imagine for a moment the data loaded in your app. There are `Cat`s and

`Walrus`es of class `Animal`; there are `String`s, `Integer`s,

`StructTime`s... But how would you go about searching and sorting these

objects? One could say this is a bad question with a bad answer.

It's a bad question because most of the time we have the objects we need ready

to hand, assigned to variables in the right place in our program -- we don't

need to find them. We don't ever sort "all" integers, just the relevant ones.

The answer is bad because one would search and sort all the objects of a given

type by walking the heap. This might be facilitated by the runtime (Ruby's

`ObjectSpace.each_object()` comes to mind) or it might not; but one is in

for a linear scan either way; and there is potential for conflict with other

threads of execution -- either preventing them from running, or tripping over

inconsistencies they introduce.

In a database, however, we do not rely on having the right context to find an

object. Whereas in a programming context we use the objects to get the fields,

in a storage context we use the fields to find the objects; there is no notion

of identity apart from field values. This is the heart of the

object-relational (or struct-enum-relational or ADT-relational) mismatch.

The two models overlap when we consider global, concurrent data structures

like event buses or concurrent maps. In SQL terms, each concurrent map would

be a relation, and in SQL each relation is a distinct type. A database is what

you would get if each of the types in your language were automatically

associated with a concurrent map.

There are two abstractions relating to types which become strange in this

all-types-backed-with-maps model:

* Inheritance, abstract base classes, and traits

* Generics (in the Java sense) or templates (in the C++ sense)

With regards to inheritance, one wonders what it would mean to insert an

`orange` in the `fruit` table or the `citrus` table. Clearly, inserting it in

any one of them should insert it in all of them. It is an ambiguity that gives

the author pause.

With regards to templated definitions, it stands to reason that these can have

no "live" representation in the database. Tables are there, or they aren't.

Perhaps a database's SQL dialect could support template expansion; but this

feature would have no impact on the nature of queries or relationships between

tables.

How tagged unions can help

--------------------------

Typed variants -- or tagged unions -- are a minimal way to expand SQL's

support for polymorphism that is helpful to object-oriented languages, and

languages like Haskell, Rust and Go which provide products or sums of products

as well as traits.

In our approach, the types which are part of the union are all themselves

physical tables with primary keys that are type compatible. We create a new

table for the union, the only columns of which are the columns of the primary

key and, through triggers, we ensure that inserts, updates and deletes to any

of the variant tables are also propagated to the union.

The union table ensures that the key spaces of the variants are disjoint and

allows for other tables to declare foreign keys that references the union.

Normal database validation logic takes over from there. The alternative would

be to have constraint triggers on each client table for each table in the

variant and to have triggers on each variant table for each client. This would

both be less expressive and a likely source of errors.

SQL tagged unions in this style support "composition" instead of inheritance

for polymorphism. For example, in the case where we have a type of letters and

would like be able handle Swiss letter, Spanish letter, Egyptian letter and

more, the modeller is tasked with breaking out the common fields into a

`letter` table which would reference `national_variant` which is a union of

`egyptian`, `ethiopian`, `etruscan` and so forth.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/solidsnack/pg-sql-variants

Awesome Lists containing this project

README