https://github.com/giannitedesco/xpdt

eXPeditious Data Transfer
https://github.com/giannitedesco/xpdt

c compilers marshalling protocol-buffers python serialization xpdt

Last synced: 6 months ago
JSON representation

eXPeditious Data Transfer

Host: GitHub
URL: https://github.com/giannitedesco/xpdt
Owner: giannitedesco
License: gpl-3.0
Created: 2021-05-23T04:22:49.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2025-10-27T15:31:21.000Z (9 months ago)
Last Synced: 2025-11-29T01:25:54.942Z (8 months ago)
Topics: c, compilers, marshalling, protocol-buffers, python, serialization, xpdt
Language: Python
Homepage:
Size: 89.8 KB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
- Support: support/c/xfilemap.c

Awesome Lists containing this project

README

          # xpdt: eXPeditious Data Transfer



  



## About

xpdt is (yet another) language for defining data-types and generating code for

serializing and deserializing them. It aims to produce code with little or no

overhead, especially in the case where some fields aren't required, and is

based on fixed-length representations allowing for branch-free zero-copy

deserialization and (at-most-)one-copy writes (source to buffer).

The generated C code, in particular, is highly optimized and often permits the

elimination of data-copying for writes and enables optimizations such as

loop-unrolling when deserializing fixed-length objects. This can lead to read

speeds in excess of 500 million objects per second (~1.8 nsec per object).

## Examples

The xpdt source language looks similar to C struct definitions:

```

struct timestamp {

	u32	tv_sec;

	u32	tv_nsec;

};

struct point {

	i32	x;

	i32	y;

	i32	z;

};

struct line {

	timestamp	time;

	point		line_start;

	point		line_end;

	bytes		comment;

};

```

Fixed width integer types from 8 to 128 bit are supported, along with the

`bytes` type, which is a variable-length sequence of bytes.

## Target Languages

The following target languages are currently supported:

- C

- Python

The C code is very highly optimized.

The Python code is about as well optimized for CPython as I can make it. It

uses typed `NamedTuple` for objects, which has some small overhead over regular

tuples, and it uses `struct.Struct` to do the packing/unpacking. I have also

code-golfed the generated bytecodes down to what I think is minimal given the

design constraints. As a result, performance of the pure Python code is

comparable to a JSON library implemented in C or Rust.

For better performance in Python, it may be desirable to develop a Cython

target. In some instances CFFI structs may be more performant since they can

avoid the creation/destruction of an object for each record.

Target languages are implemented purely as `jinja2` templates.

## Serialization format

The serialization format for fixed-length objects is simply a packed C struct,

with little-endian fields.

For any object which contains variable length fields (eg. `bytes` or `utf8`):

- a 32bit unsigned record length is prepended to the struct, this allows

  efficient skipping of the whole record

- all variable-length fields are converted to `u32` and contain the length, in bytes, of the data

- all variable-length contents are appended after the struct in the order in

  which they appear

For example, following the example above, the serialization would be:

```

u32 tot_len # = 41

u32 time.tv_sec

u32 time.tv_usec

i32 line_start.x

i32 line_start.y

i32 line_start.z

i32 line_end.x

i32 line_end.y

i32 line_end.z

u32 comment # = 5

u8 'H'

u8 'e'

u8 'l'

u8 'l'

u8 'o'

```

### Why Variable-Length Data at the End?

Placing variable-length fields at the end of the struct provides significant

performance benefits:

1. xpdt is optimized for fast zero-copy deserialization, it especially tries to

   avoid adding overhead in the case where only a subset of fields are

   required. If variable length fields could come in the middle of a struct,

   reading the length and skipping the field would incur a cost even when the

   field is being skipped.

2. Potentially large strings and payloads isolated: Only when the

   variable-length field is needed do you pay the cost of calculating their

   offsets. This is ideal for descriptive strings or optional metadata that is

   rarely accessed.

## Features

The feature-set is, as of now, pretty slim.

There are no array / sequence / map types, and no keyed unions.

Support for such things may be added in future provided that suitable

implementations exist. An implementation is suitable if:

- It admits a zero (or close to zero) overhead implementation

- it causes no overhead when the feature isn't being used

# License

The compiler is released under the GPLv3.

The C support code/headers are released under the MIT license.

The generated code is yours.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/giannitedesco/xpdt

Awesome Lists containing this project

README