Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vitiral/zoa
serialized structured data and it's textual representation
https://github.com/vitiral/zoa
civboot data serialization-format
Last synced: 2 months ago
JSON representation
serialized structured data and it's textual representation
- Host: GitHub
- URL: https://github.com/vitiral/zoa
- Owner: vitiral
- License: unlicense
- Created: 2022-01-12T17:22:39.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-12T13:37:03.000Z (about 1 year ago)
- Last Synced: 2024-10-18T21:52:39.397Z (3 months ago)
- Topics: civboot, data, serialization-format
- Language: Python
- Homepage:
- Size: 189 KB
- Stars: 4
- Watchers: 4
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.cxt
Awesome Lists containing this project
README
[!]##########################################################################[/]
[h1]zoa: serialized data made easy[/]["]Most content has moved to https://github.com/civboot/civlua/tree/main/zoa["]
["]This document is written using @cxt in README.cxt[/]
zoa is a set of standards related to serialized structured data and it's
textual representation and processing. It is part of the @Civboot
project, specifically @fngi.["] zoa is named after "protozoa", which is itself a nod to @protobuf. Yes, I'm
aware protozoa are not fungi, but the name was too good.[/]zoa encompases the following technologies:[+]
* [b]zty[b] `.zty`: specification for structured types similar to @protobuf
* [b]zoab[b] `.zoa`: a binary structured data composed of only data and array.
* [b]zoat[b] `.zoa`: an extensible text format for representing zoab in a
human readable way. Zoat intentionally has little functionality on it's
own but allows extensibility and a standardized syntax for the different
tools built on top of it.
* [b]zoac[b] `.zc`: a configuration lanugage built on zoat and and another
langauge (i.e. @fngi), allowing execution of arbitrary (but
stateless) user-defined functions.
* [b]zoash[b] `.zh`: a text-based shell built on zoac allowing creating of
processes, mutation of files or anything else a computer user might want
to do. zoac functions are still callable to process data, but other
functions are available as well perform side effects. File extension:
.zh
[/]["]Yes, zoab and zoat have the same file extension. This is intentional and you
will see how below![/]Both zoab and zoat are extremely simple.
zoab is about the simplest possible general purpose structured data that can
exist. Having only two types allows it to be compact and fast. The use of
single byte signals alows it to be trivially deserialized by even
embedded system.zoat is simpler than yaml but can achieve the same functionaly but with far
less ambiguitity. The only thing it is missing is integer/float/literal-map/[
]none types, which are not actually necessary for serializing or deserializing
arbitrary structured data. Zoat also has massively increased functionality over
yaml with features like variable storage and concatenation.[t set=protobuf r=https://developers.google.com/protocol-buffers]protobuf[/]
[t set=cxt r=https://github.com/civboot/cxt]cxt[/]
[t set=Civboot r=https://civboot.org]Civboot[/]
[t set=fngi r=http://github.com/civboot/fngi]fngi[/][!]##########################[/]
[h2]zoat[/]zoat (like zoab) has only two data types: data and array. Data start at the
first non-whitespace character (except for `|` or `{`, which we will get to)
and continue until a pipe. For example:[###]
This is a zoat string. It ends with a pipe|
[###]An array is specified using curly braces and can be nested, like this:
[###]
{ first string in array|
second string in array|
{ nested array element 1|
nested array element 2|
}
}
[###]Leading whitespace is always ignored until the first character, after which
newlines are treated as spaces and special characters use c-like escapes. Here
are the valid escapes in text:
[###]
\n newline
\t tab
\| a literal '|' character
\{ a literal '{' character
\} a literal '}' character
\s
\ a literal space character
\xHH hex byte of value 0xHH
\ line continuation. When \ is used at the end of a line no
space will be insert when going to the next line.An example,
there is one space after the comma above. \
\ However there are three spaces after the period since
an escape was used both before and after the newline.|
[###]The following are how to use commands. Commands must be specified immediately
following either a pipe or curly brace:[###]
|X Where X is a non-whitespace character. Runs a command, some of
which are defined below. Note that multiple '|' together are
ignored, i.e. '|| | | ||' is the same as '|'.{ Starts an array. Characters following are always treated as text
(never a command), but whitespace is still ignored. Example:
{array item 1| array item 2}} Ends an array. Characters following are always treated as text
(never a command), but whitespace is still ignored. Example above.Note: {{}} will create an empty array inside a array. However,
adding '|' does nothing. i.e. '|{| {||} |}|' is exactly the same.
This allows for trailing pipes. If you've ever used JSON you know
the pain of not being able to use trailing commas, this avoids
that pain.| A set of data until the next '|', '{' or '}'. There _must_ be
whitespace (newline, tab or space) after | otherwise the character
will be treated as a command. Whitespace will be ignored until the
first non-whitespace character.|\ Starts text immediately using the escape. Example:
|\nThis text started on a new line.||* Line comment. Text is ignored until a newline.
{* Block comment {*nesting allowed*}, it must be closed with: *}
After a closing block comment a new command can start immediately.
Example:
{* comment *} |h1 header|''' Plain-text/code block. Any number of ' can be used, and must be
escaped with the same number of ', which ends the item (but will
not execute a command. As a general rule only '|' executes
commands). The raw text will ignore escapes (i.e. \n) and preserve
newlines, _except_ for the first newline if it immediately follows
the opening ' block or the last newline if it immediately preceeds
the ' block. Also, the text will be de-indented by the indent at
the start of the command. Example:
|'''
This is some raw text.
It will have no indent.
I can use { | } \n etc and they are interpreted literally.
I can even end with a space. '''
This is a new item, as if the previous had ended with pipe||+ Concatenate (join) this item to the last item of the same type.
Note that text starts immediately.
Example:
all one |+single item. ==> all one single item.|
{1| 2} |+ {3| 4} ==> {1| 2| 3| 4}|. Begins the variable compiler, similar to fngi's. When assigning
variables, values are not added until referenced. Examples:|.foo = fngi | |* set foo equal to "fngi "
{ I think |.foo|is great } |* { I think |fngi |is great }An array can be accessed by index using @
|.myArray @0 @3| |* C equivalent: myArray[0][3]
If the data is an array of two-length arrays with strings as the
first item, you can access by key with '.'. Whitespace or special
characters in the keys are not supported.|.myDict.foo.bar|
|.+ Joins the variable to the last item. Example:
{ I think |.+foo|+is great } |* { I think fngi is great }|[0-9] An integer (big-endian). 0x (hex), 0b (binary) and 0c (char) are
supported. (see Type Specification Dyn)|-[0-9] A negative integer (big-endian, see Type Specification Dyn)
[###][!]##########################[/]
[h2]zoab: structured array and byte data[/]The binary representation of zoa is called zoab. Like zoat it is composed of
only two types: data and array. Both can only have sections of length 63, but a
join bit may be used to make longer values.The start of a zoab item has a single byte which specifies it's type, join
status and length. it looks like:[###]
Bitmap Description
JTLL LLLL : J=join bit T=type bit L=length bits (0-63)
[###][+]
* `J` can be 1, which will cause it to be "joined" to the next item (which
must be the same type). This allows the true length to be longer than 63
(any length is possible with multiple joins).
* `T` can be 0 for "data" or 1 for "array".
* `L` contains a length (0-63) of the data or array.
[/][!]##########################[/]
[h2]Binary Runtime Type Selection[/]There is one byte that is invalid for both zoab and utf8 (and ascii):
`J=1 T=0 L=0`, which in binary is `1000 0000` or in hex is 0x80.Instead of making this byte illegal, we will _require_ it in the `zoa(...)`
function, which is intended to deserialize either zoab or zoat from user
inputs, data stored in file systems or data transferred over the wire. If 0x80
is the first character then it is assumed this is zoab data. If 0x80
is [i]not[i] the first byte then it should be assumed that the file/etc is
human written zoat.["] 0x80 will still be illegal for pure-zoab related functions, which expect
the data to already be deserialzed.[/]0x80 will signal that the next byte "steam type". The supported ones are:[+]
1. pure unstructured data with no special meaning.
2. _mostly_ human readable data. i.e. struct fields are names, integers are in
text format, etc. This is typically what should be passed to dbg logs, human
readable compressed configs, etc.
3. protozoa data, which is similar to protobuf. Fields are represented by
user-assigned integers, data is in binary format, etc. The deserializer
obviously needs access to a backwards-compatible reference (i.e. a struct
definition) to know how to deserialize this data.
4. log event, used for zoa's own core logging.
[/]We also reserve `J=1 T=1 L=0` to mean that the next data value is actually a
pointer to more data. This allows fngi (or other languages) to use a maximum
block size of 4kB while still permitting arbitrary length zoab data.[h2]Type Specification[/]
["][b]WARNING:[b] This section is currently being worked on so is not yet
polished.[/]Zoa also has a default type protocol and text specification for generating
serializers and deserializers for zoab data. These are typically contained in
`.zt` files.Zoa types are specified in a language similar to protobufs, with a few key
differences. For an overview:Native types include:[+]
* Data, Int, Num, Time (integers: seconds, nanoseconds), Path
(array of components)
* Arr(generic)
* IntMap(Int key, generic value), DataMap(Data key, generic value)
* Dyn: dynamic (runtime) type
[/]Users can create their own struct types with the format below. Struct fields can
be positional or have an id. Default values can be provided, otherwise all
arguments are required.[###]
\ Comments use '\' character
struct Foo [
a: Int; \ positional required argument.
b: Int = 0; \ positional argument with default.
c: Data = ||; \ data argument with default.
d:0 Int = 0; \ indexed argument with default.
d:1 Arr[Int] = 0;
]
[###]The struct protocol is:[+]
* The first items contains the number of positional args P.
* The next P items contains the first P specified positional arguments.
Any unspecified positional args must have a default value.
* Indexed args can be specified in any order.
* Any unspecified args must have a default.
[/]Similarily, enumerated values can be defined. Enums must always specify their
id. All ids from [[0 - max specified]] must be declared (in any order).[#]
struct Cheese [ isCheddar: Int ]
struct Cake [ isChocolate: Int ]
struct Pizza [
cheese: Cheese = {
isCheddar|1|
};
hasPeperoni: Int = 0;
]enum Food [
none:0; \ note: empty type, no data
someNuts:3 Int; \ note: declared out of order (OK)
pizza:1 Pizza;
cake:2 Cake;
cheeses:3 Arr[Cheese];
]
[#]The enum protocol is:[+]
* The first item contains the enum variant, i.e. noaccount, user, admin
* If the variant is not an `Arr` or `*Map`, the next item is the value.
* else, the next N values are the data for the array.
[/][!]##########################[/]
[h2]Contributing[/]To build the README.md and run the tests, simply run `make`.
When opening a PR to submit code to this repository you must include the
following disclaimer in your first commit message:[###]
I assent to license this and all future contributions to this project
under the dual licenses of the UNLICENSE or MIT license listed in the
`UNLICENSE` and `README.md` files of this repository.
[###][!]##########################[/]
[h2]LICENSING[/]This work is part of the @Civboot project and therefore primarily exists for
educational purposes. Attribution to the authors and project is appreciated but
not necessary.Therefore this body of work is licensed using the UNLICENSE unless otherwise
specified at the beginning of the source file.If for any reason the UNLICENSE is not valid in your jurisdiction or project,
this work can be singly or dual licensed at your discression with the MIT
license below.[###]
Copyright 2022 Garrett BergPermission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
[###]