Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/libmir/asdf
JSON library
https://github.com/libmir/asdf
Last synced: about 2 months ago
JSON representation
JSON library
- Host: GitHub
- URL: https://github.com/libmir/asdf
- Owner: libmir
- License: bsl-1.0
- Created: 2020-04-03T09:24:07.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2024-08-20T07:09:05.000Z (5 months ago)
- Last Synced: 2024-11-18T20:51:48.697Z (2 months ago)
- Language: D
- Homepage: http://asdf.libmir.org
- Size: 2.14 MB
- Stars: 20
- Watchers: 8
- Forks: 8
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
- awesome-d - asdf - Cache oriented string based JSON representation for fast read & writes and serialisation. (Data serialization / JSON)
README
[![Dub version](https://img.shields.io/dub/v/asdf.svg)](http://code.dlang.org/packages/asdf)
[![Dub downloads](https://img.shields.io/dub/dt/asdf.svg)](http://code.dlang.org/packages/asdf)
[![License](https://img.shields.io/dub/l/asdf.svg)](http://code.dlang.org/packages/asdf)
[![codecov.io](https://codecov.io/github/libmir/asdf/coverage.svg?branch=master)](https://codecov.io/github/tamediadigital/asdf?branch=master)
[![Build Status](https://travis-ci.org/libmir/asdf.svg?branch=master)](https://travis-ci.org/libmir/asdf)
[![Circle CI Docs](https://circleci.com/gh/libmir/asdf.svg?style=shield&circle-token=:circle-ci-badge-token)](https://circleci.com/gh/libmir/asdf)
[![Build status](https://ci.appveyor.com/api/projects/status/libmir/asdf?svg=true)](https://ci.appveyor.com/project/9il/asdf-sesrc)# A Simple Document Format
ASDF is a cache oriented string based JSON representation.
Besides, it is a convenient Json Library for D that gets out of your way.
ASDF is specially geared towards transforming high volumes of JSON dataframes, either to new
JSON Objects or to custom data types.#### Why ASDF?
asdf was originally developed at [Tamedia](https://www.tamedia.ch/) to extract and transform real-time click streams.
- ASDF is fast. It can be really helpful if you have gigabytes of JSON line separated values.
- ASDF is simple. It uses D's modelling power to make you write less boilerplate code.
- ASDF is tested and used in production for real World JSON generated by millions of web clients (we call it _the great fuzzer_).see also [github.com/tamediadigital/je](https://github.com/tamediadigital/je) a tool for fast extraction of json properties into a csv/tsv.
#### Simple Example
1. define your struct
2. call `serializeToJson` ( or `serializeToJsonPretty` for pretty printing! )
3. profit!```D
/+dub.sdl:
dependency "asdf" version="~>0.2.5"#turns on SSE4.2 optimizations when compiled with LDC
dflags "-mattr=+sse4.2" platform="ldc"
+/
import asdf;struct Simple
{
string name;
ulong level;
}void main()
{
auto o = Simple("asdf", 42);
string data = `{"name":"asdf","level":42}`;
assert(o.serializeToJson() == data);
assert(data.deserialize!Simple == o);
}
```
#### DocumentationSee ASDF [API](http://asdf.libmir.org) and [Specification](https://github.com/tamediadigital/asdf/blob/master/SPECIFICATION.md).
#### I/O Speed
- Reading JSON line separated values and parsing them to ASDF - 300+ MB per second (SSD).
- Writing ASDF range to JSON line separated values - 300+ MB per second (SSD).#### Fast setup with the dub package manager
[![Dub version](https://img.shields.io/dub/v/asdf.svg)](http://code.dlang.org/packages/asdf)
[Dub](https://code.dlang.org/getting_started) is D's package manager.
You can create a new project with:```
dub init
```Now you need to edit the `dub.json` add `asdf` as dependency and set its targetType to `executable`.
(dub.json)
```json
{
...
"dependencies": {
"asdf": "~>"
},
"targetType": "executable",
"dflags-ldc": ["-mcpu=native"]
}
```(dub.sdl)
```sdl
dependency "asdf" version="~>"
targetType "executable"
dflags "-mcpu=native" platform="ldc"
```Now you can create a main file in the `source` and run your code with
```
dub
```
Flags `--build=release` and `--compiler=ldmd2` can be added for a performance boost:
```
dub --build=release --compiler=ldmd2
````ldmd2` is a shell on top of [LDC (LLVM D Compiler)](https://github.com/ldc-developers/ldc).
`"dflags-ldc": ["-mcpu=native"]` allows LDC to optimize ASDF for your CPU.Instead of using `-mcpu=native`, you may specify an additional instruction set for a target with `-mattr`.
For example, `-mattr=+sse4.2`. ASDF has specialized code for
[SSE4.2](https://en.wikipedia.org/wiki/SSE4#SSE4.2 instruction set).#### Main transformation functions
| uda | function |
| ------------- |:-------------:|
| `@serdeKeys("bar_common", "bar")` | tries to read the data from either property. saves it to the first one |
| `@serdeKeysIn("a", "b")` | tries to read the data from `a`, then `b`. last one occuring in the json wins |
| `@serdeKeyOut("a")` | writes it to `a` |
| `@serdeIgnore` | ignore this property completely |
| `@serdeIgnoreIn` | don't read this property |
| `@serdeIgnoreOut` | don't write this property |
| `@serdeIgnoreOutIf!condition` | run function `condition` on serialization and don't write this property if the result is true |
| `@serdeScoped` | Dangerous! non allocating strings. this means data can vanish if the underlying buffer is removed. |
| `@serdeProxy!string` | call to!string |
| `@serdeTransformIn!fin` | call function `fin` to transform the data |
| `@serdeTransformOut!fout` | run function `fout` on serialization, different notation |
| `@serdeAllowMultiple` | Allows deserialiser to serialize multiple keys for the same object member input. |
| `@serdeOptional` | Allows deserialiser to to skip member desrization of no keys corresponding keys input. |Please also look into the Docs or Unittest for concrete examples!
#### ASDF Example (incomplete)
```D
import std.algorithm;
import std.stdio;
import asdf;void main()
{
auto target = Asdf("red");
File("input.jsonl")
// Use at least 4096 bytes for real world apps
.byChunk(4096)
// 32 is minimum size for internal buffer. Buffer can be reallocated to get more memory.
.parseJsonByLine(4096)
.filter!(object => object
// opIndex accepts array of keys: {"key0": {"key1": { ... {"keyN-1": }... }}}
["colors"]
// iterates over an array
.byElement
// Comparison with ASDF is little bit faster
// than comparison with a string.
.canFind(target))
//.canFind("red"))
// Formatting uses internal buffer to reduce system delegate and system function calls
.each!writeln;
}
```##### Input
Single object per line: 4th and 5th lines are broken.
```json
null
{"colors": ["red"]}
{"a":"b", "colors": [4, "red", "string"]}
{"colors":["red"],
"comment" : "this is broken (multiline) object"}
{"colors": "green"}
{"colors": "red"]}}
[]
```##### Output
```json
{"colors":["red"]}
{"a":"b","colors":[4,"red","string"]}
```#### JSON and ASDF Serialization Examples
##### Simple struct or object
```d
struct S
{
string a;
long b;
private int c; // private fields are ignored
package int d; // package fields are ignored
// all other fields in JSON are ignored
}
```##### Selection
```d
struct S
{
// ignored
@serdeIgnore int temp;
// can be formatted to json
@serdeIgnoreIn int a;
//can be parsed from json
@serdeIgnoreOut int b;
// ignored if negative
@serdeIgnoreOutIf!`a < 0` int c;
}
```##### Key overriding
```d
struct S
{
// key is overrided to "aaa"
@serdeKeys("aaa") int a;// overloads multiple keys for parsing
@serdeKeysIn("b", "_b")
// overloads key for generation
@serdeKeyOut("_b_")
int b;
}
```##### User-Defined Serialization
```d
struct DateTimeProxy
{
DateTime datetime;
alias datetime this;SerdeException deserializeFromAsdf(Asdf data)
{
string val;
if (auto exc = deserializeScopedString(data, val))
return exc;
this = DateTimeProxy(DateTime.fromISOString(val));
return null;
}void serialize(S)(ref S serializer)
{
serializer.putValue(datetime.toISOString);
}
}
``````d
//serialize a Doubly Linked list into an Array
struct SomeDoublyLinkedList
{
@serdeIgnore DList!(SomeArr[]) myDll;
alias myDll this;//no template but a function this time!
void serialize(ref AsdfSerializer serializer)
{
auto state = serializer.listBegin();
foreach (ref elem; myDll)
{
serializer.elemBegin;
serializer.serializeValue(elem);
}
serializer.listEnd(state);
}
}
```##### Serialization Proxy
```d
struct S
{
@serdeProxy!DateTimeProxy DateTime time;
}
``````d
@serdeProxy!ProxyE
enum E
{
none,
bar,
}// const(char)[] doesn't reallocate ASDF data.
@serdeProxy!(const(char)[])
struct ProxyE
{
E e;this(E e)
{
this.e = e;
}this(in char[] str)
{
switch(str)
{
case "NONE":
case "NA":
case "N/A":
e = E.none;
break;
case "BAR":
case "BR":
e = E.bar;
break;
default:
throw new Exception("Unknown: " ~ cast(string)str);
}
}string toString()
{
if (e == E.none)
return "NONE";
else
return "BAR";
}E opCast(T : E)()
{
return e;
}
}unittest
{
assert(serializeToJson(E.bar) == `"BAR"`);
assert(`"N/A"`.deserialize!E == E.none);
assert(`"NA"`.deserialize!E == E.none);
}
```##### Finalizer
If you need to do additional calculations or etl transformations that happen to depend on the deserialized data use the `finalizeDeserialization` method.```d
struct S
{
string a;
int b;@serdeIgnoreIn double sum;
void finalizeDeserialization(Asdf data)
{
auto r = data["c", "d"];
auto a = r["e"].get(0.0);
auto b = r["g"].get(0.0);
sum = a + b;
}
}
assert(`{"a":"bar","b":3,"c":{"d":{"e":6,"g":7}}}`.deserialize!S == S("bar", 3, 13));
```