Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/libmir/asdf

JSON library
https://github.com/libmir/asdf

Last synced: about 2 hours ago
JSON representation

JSON library

Awesome Lists containing this project

README

        

[![Dub version](https://img.shields.io/dub/v/asdf.svg)](http://code.dlang.org/packages/asdf)
[![Dub downloads](https://img.shields.io/dub/dt/asdf.svg)](http://code.dlang.org/packages/asdf)
[![License](https://img.shields.io/dub/l/asdf.svg)](http://code.dlang.org/packages/asdf)
[![codecov.io](https://codecov.io/github/libmir/asdf/coverage.svg?branch=master)](https://codecov.io/github/tamediadigital/asdf?branch=master)
[![Build Status](https://travis-ci.org/libmir/asdf.svg?branch=master)](https://travis-ci.org/libmir/asdf)
[![Circle CI Docs](https://circleci.com/gh/libmir/asdf.svg?style=shield&circle-token=:circle-ci-badge-token)](https://circleci.com/gh/libmir/asdf)
[![Build status](https://ci.appveyor.com/api/projects/status/libmir/asdf?svg=true)](https://ci.appveyor.com/project/9il/asdf-sesrc)

# A Simple Document Format

ASDF is a cache oriented string based JSON representation.
Besides, it is a convenient Json Library for D that gets out of your way.
ASDF is specially geared towards transforming high volumes of JSON dataframes, either to new
JSON Objects or to custom data types.

#### Why ASDF?

asdf was originally developed at [Tamedia](https://www.tamedia.ch/) to extract and transform real-time click streams.

- ASDF is fast. It can be really helpful if you have gigabytes of JSON line separated values.
- ASDF is simple. It uses D's modelling power to make you write less boilerplate code.
- ASDF is tested and used in production for real World JSON generated by millions of web clients (we call it _the great fuzzer_).

see also [github.com/tamediadigital/je](https://github.com/tamediadigital/je) a tool for fast extraction of json properties into a csv/tsv.

#### Simple Example

1. define your struct
2. call `serializeToJson` ( or `serializeToJsonPretty` for pretty printing! )
3. profit!

```D
/+dub.sdl:
dependency "asdf" version="~>0.2.5"

#turns on SSE4.2 optimizations when compiled with LDC
dflags "-mattr=+sse4.2" platform="ldc"
+/
import asdf;

struct Simple
{
string name;
ulong level;
}

void main()
{
auto o = Simple("asdf", 42);
string data = `{"name":"asdf","level":42}`;
assert(o.serializeToJson() == data);
assert(data.deserialize!Simple == o);
}
```
#### Documentation

See ASDF [API](http://asdf.libmir.org) and [Specification](https://github.com/tamediadigital/asdf/blob/master/SPECIFICATION.md).

#### I/O Speed

- Reading JSON line separated values and parsing them to ASDF - 300+ MB per second (SSD).
- Writing ASDF range to JSON line separated values - 300+ MB per second (SSD).

#### Fast setup with the dub package manager

[![Dub version](https://img.shields.io/dub/v/asdf.svg)](http://code.dlang.org/packages/asdf)

[Dub](https://code.dlang.org/getting_started) is D's package manager.
You can create a new project with:

```
dub init
```

Now you need to edit the `dub.json` add `asdf` as dependency and set its targetType to `executable`.

(dub.json)
```json
{
...
"dependencies": {
"asdf": "~>"
},
"targetType": "executable",
"dflags-ldc": ["-mcpu=native"]
}
```

(dub.sdl)
```sdl
dependency "asdf" version="~>"
targetType "executable"
dflags "-mcpu=native" platform="ldc"
```

Now you can create a main file in the `source` and run your code with
```
dub
```
Flags `--build=release` and `--compiler=ldmd2` can be added for a performance boost:
```
dub --build=release --compiler=ldmd2
```

`ldmd2` is a shell on top of [LDC (LLVM D Compiler)](https://github.com/ldc-developers/ldc).
`"dflags-ldc": ["-mcpu=native"]` allows LDC to optimize ASDF for your CPU.

Instead of using `-mcpu=native`, you may specify an additional instruction set for a target with `-mattr`.
For example, `-mattr=+sse4.2`. ASDF has specialized code for
[SSE4.2](https://en.wikipedia.org/wiki/SSE4#SSE4.2 instruction set).

#### Main transformation functions

| uda | function |
| ------------- |:-------------:|
| `@serdeKeys("bar_common", "bar")` | tries to read the data from either property. saves it to the first one |
| `@serdeKeysIn("a", "b")` | tries to read the data from `a`, then `b`. last one occuring in the json wins |
| `@serdeKeyOut("a")` | writes it to `a` |
| `@serdeIgnore` | ignore this property completely |
| `@serdeIgnoreIn` | don't read this property |
| `@serdeIgnoreOut` | don't write this property |
| `@serdeIgnoreOutIf!condition` | run function `condition` on serialization and don't write this property if the result is true |
| `@serdeScoped` | Dangerous! non allocating strings. this means data can vanish if the underlying buffer is removed. |
| `@serdeProxy!string` | call to!string |
| `@serdeTransformIn!fin` | call function `fin` to transform the data |
| `@serdeTransformOut!fout` | run function `fout` on serialization, different notation |
| `@serdeAllowMultiple` | Allows deserialiser to serialize multiple keys for the same object member input. |
| `@serdeOptional` | Allows deserialiser to to skip member desrization of no keys corresponding keys input. |

Please also look into the Docs or Unittest for concrete examples!

#### ASDF Example (incomplete)

```D
import std.algorithm;
import std.stdio;
import asdf;

void main()
{
auto target = Asdf("red");
File("input.jsonl")
// Use at least 4096 bytes for real world apps
.byChunk(4096)
// 32 is minimum size for internal buffer. Buffer can be reallocated to get more memory.
.parseJsonByLine(4096)
.filter!(object => object
// opIndex accepts array of keys: {"key0": {"key1": { ... {"keyN-1": }... }}}
["colors"]
// iterates over an array
.byElement
// Comparison with ASDF is little bit faster
// than comparison with a string.
.canFind(target))
//.canFind("red"))
// Formatting uses internal buffer to reduce system delegate and system function calls
.each!writeln;
}
```

##### Input

Single object per line: 4th and 5th lines are broken.

```json
null
{"colors": ["red"]}
{"a":"b", "colors": [4, "red", "string"]}
{"colors":["red"],
"comment" : "this is broken (multiline) object"}
{"colors": "green"}
{"colors": "red"]}}
[]
```

##### Output

```json
{"colors":["red"]}
{"a":"b","colors":[4,"red","string"]}
```

#### JSON and ASDF Serialization Examples

##### Simple struct or object
```d
struct S
{
string a;
long b;
private int c; // private fields are ignored
package int d; // package fields are ignored
// all other fields in JSON are ignored
}
```

##### Selection
```d
struct S
{
// ignored
@serdeIgnore int temp;

// can be formatted to json
@serdeIgnoreIn int a;

//can be parsed from json
@serdeIgnoreOut int b;

// ignored if negative
@serdeIgnoreOutIf!`a < 0` int c;
}
```

##### Key overriding
```d
struct S
{
// key is overrided to "aaa"
@serdeKeys("aaa") int a;

// overloads multiple keys for parsing
@serdeKeysIn("b", "_b")
// overloads key for generation
@serdeKeyOut("_b_")
int b;
}
```

##### User-Defined Serialization
```d
struct DateTimeProxy
{
DateTime datetime;
alias datetime this;

SerdeException deserializeFromAsdf(Asdf data)
{
string val;
if (auto exc = deserializeScopedString(data, val))
return exc;
this = DateTimeProxy(DateTime.fromISOString(val));
return null;
}

void serialize(S)(ref S serializer)
{
serializer.putValue(datetime.toISOString);
}
}
```

```d
//serialize a Doubly Linked list into an Array
struct SomeDoublyLinkedList
{
@serdeIgnore DList!(SomeArr[]) myDll;
alias myDll this;

//no template but a function this time!
void serialize(ref AsdfSerializer serializer)
{
auto state = serializer.listBegin();
foreach (ref elem; myDll)
{
serializer.elemBegin;
serializer.serializeValue(elem);
}
serializer.listEnd(state);
}
}
```

##### Serialization Proxy
```d
struct S
{
@serdeProxy!DateTimeProxy DateTime time;
}
```

```d
@serdeProxy!ProxyE
enum E
{
none,
bar,
}

// const(char)[] doesn't reallocate ASDF data.
@serdeProxy!(const(char)[])
struct ProxyE
{
E e;

this(E e)
{
this.e = e;
}

this(in char[] str)
{
switch(str)
{
case "NONE":
case "NA":
case "N/A":
e = E.none;
break;
case "BAR":
case "BR":
e = E.bar;
break;
default:
throw new Exception("Unknown: " ~ cast(string)str);
}
}

string toString()
{
if (e == E.none)
return "NONE";
else
return "BAR";
}

E opCast(T : E)()
{
return e;
}
}

unittest
{
assert(serializeToJson(E.bar) == `"BAR"`);
assert(`"N/A"`.deserialize!E == E.none);
assert(`"NA"`.deserialize!E == E.none);
}
```

##### Finalizer
If you need to do additional calculations or etl transformations that happen to depend on the deserialized data use the `finalizeDeserialization` method.

```d
struct S
{
string a;
int b;

@serdeIgnoreIn double sum;

void finalizeDeserialization(Asdf data)
{
auto r = data["c", "d"];
auto a = r["e"].get(0.0);
auto b = r["g"].get(0.0);
sum = a + b;
}
}
assert(`{"a":"bar","b":3,"c":{"d":{"e":6,"g":7}}}`.deserialize!S == S("bar", 3, 13));
```