Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bcheidemann/entype

CLI tool and library which generates types for serialized data formats.
https://github.com/bcheidemann/entype

Last synced: 2 months ago
JSON representation

CLI tool and library which generates types for serialized data formats.

Awesome Lists containing this project

README

        

# Entype

Entype is a CLI tool and library which ingests serialized data formats
(currently only JSON) and outputs type definitions for different languages
(currently Rust and TypeScript).

## Installation

### Deno

Entype can be installed using the Deno CLI:

```sh
deno install --allow-read --allow-net https://deno.land/x/entype/main.ts
```

And can then be run using the `entype` command:

```sh
entype --lang rust fixtures/datapack/blockstates/*.json
```

Alternatively, it can be run using the Deno CLI without the need to install the
command globally:

```sh
deno run --allow-read https://deno.land/x/entype/main.ts --lang rust fixtures/datapack/blockstates/*.json
```

### NPM

Entype can be installed using NPM:

```sh
npm i -g typegen-json
```

And can then be run using the `typegen-json` command:

```sh
typegen-json --lang rust fixtures/datapack/blockstates/*.json
```

Alternatively, it can be run using NPX without the need to install the command
globally:

```sh
npx typegen-json --lang rust fixtures/datapack/blockstates/*.json
```

## Usage

Entype accepts files to generate type definitions for and emits type definitions
to stdout.

```sh
entype --lang typescript fixtures/datapack/blockstates/*.json
```

The above example will output the following TypeScript type definitions:

```ts
export type ArrayElement5 = {
model: string;
uvlock: boolean | null | undefined;
weight: number | null | undefined;
x: number | null | undefined;
y: number | null | undefined;
};

export type Struct15 = {
model: string;
uvlock: boolean | null | undefined;
x: number | null | undefined;
y: number | null | undefined;
};

export type Apply3 =
| Array
| Struct15;

export type TElement29 = {
facing: string | null | undefined;
slot_0_occupied: string | null | undefined;
slot_1_occupied: string | null | undefined;
slot_2_occupied: string | null | undefined;
slot_3_occupied: string | null | undefined;
slot_4_occupied: string | null | undefined;
slot_5_occupied: string | null | undefined;
};

export type TElement66 = {
east: string | null | undefined;
north: string | null | undefined;
south: string | null | undefined;
up: string | null | undefined;
west: string | null | undefined;
};

export type T24 = {
age: string | null | undefined;
AND: Array | null | undefined;
down: string | null | undefined;
east: string | null | undefined;
facing: string | null | undefined;
flower_amount: string | null | undefined;
has_bottle_0: string | null | undefined;
has_bottle_1: string | null | undefined;
has_bottle_2: string | null | undefined;
leaves: string | null | undefined;
level: string | null | undefined;
north: string | null | undefined;
OR: Array | null | undefined;
south: string | null | undefined;
up: string | null | undefined;
west: string | null | undefined;
};

export type TElement2 = {
apply: Apply3;
when: T24 | null | undefined;
};

export type ArrayElement87 = {
model: string;
x: number | null | undefined;
y: number | null | undefined;
};

export type Struct93 = {
model: string;
uvlock: boolean | null | undefined;
x: number | null | undefined;
y: number | null | undefined;
};

export type TEntry85 =
| Array
| Struct93;

export type Root = {
multipart: Array | null | undefined;
variants: Record | null | undefined;
};
```

Alternatively, Rust types can be generated as follows:

```sh
entype --lang rust fixtures/datapack/blockstates/*.json
```

```rust
pub struct ArrayElement5 {
model: String,
uvlock: Option,
weight: Option,
x: Option,
y: Option,
}

pub struct Struct15 {
model: String,
uvlock: Option,
x: Option,
y: Option,
}

pub enum Apply3 {
Array(Vec),
Struct(Struct15),
}

pub struct TElement29 {
facing: Option,
slot_0_occupied: Option,
slot_1_occupied: Option,
slot_2_occupied: Option,
slot_3_occupied: Option,
slot_4_occupied: Option,
slot_5_occupied: Option,
}

pub struct TElement66 {
east: Option,
north: Option,
south: Option,
up: Option,
west: Option,
}

pub struct T24 {
age: Option,
AND: Option>,
down: Option,
east: Option,
facing: Option,
flower_amount: Option,
has_bottle_0: Option,
has_bottle_1: Option,
has_bottle_2: Option,
leaves: Option,
level: Option,
north: Option,
OR: Option>,
south: Option,
up: Option,
west: Option,
}

pub struct TElement2 {
apply: Apply3,
when: Option,
}

pub struct ArrayElement87 {
model: String,
x: Option,
y: Option,
}

pub struct Struct93 {
model: String,
uvlock: Option,
x: Option,
y: Option,
}

pub enum TEntry85 {
Array(Vec),
Struct(Struct93),
}

pub struct Root {
multipart: Option>,
variants: Option>,
}
```

## Plugins (experimental)

Plugins are an experimental feature which allow the behaviour of the type
emitter to be customised or enhanced. Plugins can be used as follows:

```sh
entype --allow-unstable --lang rust --plugin serde-derive fixtures/datapack/blockstates/*.json
```

Note that the `--allow-unstable` flag is required to use plugins, as their
behaviour may change in future.

The above command will emit the following code:

```rust
#[derive(serde::Serialize, serde::Deserialize)]
pub struct ArrayElement5 {
model: String,
uvlock: Option,
weight: Option,
x: Option,
y: Option,
}

#[derive(serde::Serialize, serde::Deserialize)]
pub struct Struct15 {
model: String,
uvlock: Option,
x: Option,
y: Option,
}

#[derive(serde::Serialize, serde::Deserialize)]
#[serde(untagged)]
pub enum Apply3 {
Array(Vec),
Struct(Struct15),
}

#[derive(serde::Serialize, serde::Deserialize)]
pub struct TElement29 {
facing: Option,
slot_0_occupied: Option,
slot_1_occupied: Option,
slot_2_occupied: Option,
slot_3_occupied: Option,
slot_4_occupied: Option,
slot_5_occupied: Option,
}

#[derive(serde::Serialize, serde::Deserialize)]
pub struct TElement66 {
east: Option,
north: Option,
south: Option,
up: Option,
west: Option,
}

#[derive(serde::Serialize, serde::Deserialize)]
pub struct T24 {
age: Option,
AND: Option>,
down: Option,
east: Option,
facing: Option,
flower_amount: Option,
has_bottle_0: Option,
has_bottle_1: Option,
has_bottle_2: Option,
leaves: Option,
level: Option,
north: Option,
OR: Option>,
south: Option,
up: Option,
west: Option,
}

#[derive(serde::Serialize, serde::Deserialize)]
pub struct TElement2 {
apply: Apply3,
when: Option,
}

#[derive(serde::Serialize, serde::Deserialize)]
pub struct ArrayElement87 {
model: String,
x: Option,
y: Option,
}

#[derive(serde::Serialize, serde::Deserialize)]
pub struct Struct93 {
model: String,
uvlock: Option,
x: Option,
y: Option,
}

#[derive(serde::Serialize, serde::Deserialize)]
#[serde(untagged)]
pub enum TEntry85 {
Array(Vec),
Struct(Struct93),
}

#[derive(serde::Serialize, serde::Deserialize)]
pub struct Root {
multipart: Option>,
variants: Option>,
}
```

This code is identical to that emitted without the `serde-derive` plugin, except
that all types have been decorated with the appropriate Serde derive macros.

There are currently two inbuilt plugins:

- `serde-derive`
- `derive-debug`

It is possible to implement third party plugins. If you wish to do so, you can
use one of the [inbuilt plugins](lib/plugins/serde-derive.ts) as a reference.

Third party plugins can be specified by URL:

```sh
entype \
--allow-unstable \
--lang rust \
--plugin "https://raw.githubusercontent.com/bcheidemann/entype/main/lib/plugins/serde-derive.ts" \
fixtures/datapack/blockstates/*.json
```

Alternatively, the `github:` qualifier can be used:

```sh
entype \
--allow-unstable \
--lang rust \
--plugin "github:bcheidemann/entype/lib/plugins/derive-debug.ts" \
fixtures/datapack/blockstates/*.json
```

The format is `github:@//`, where `branch` defaults
to `main` and `path` defaults to `mod.ts`. The plugin speicifier
`github:bcheidemann/entype-plugin-example` would be resolved to
`https://raw.githubusercontent.com/bcheidemann/entype-plugin-example/main/mod.ts`.

### Node.js

For Node.js versions of entype (i.e. the one installed from NPM) plugins cannot
be imported by URL. Instead, third party plugins must be published to NPM and
can then be imported by their package name.

## How it works

Entype tries to generate the simplest possible type that accurately describes
the input data. For instance, given the following two input files:

```jsonc
// 200.json
{
"statusCode": 200,
"data": {
"message": "Hello World!"
}
}
```

```jsonc
// 500.json
{
"statusCode": 500,
"error": {
"message": "An internal server error occurred"
}
}
```

Entype will first generate an intermediate representation for the first file
(200.json). If emitted to TypeScript, it would look something like this:

```typescript
export type Data0 = {
message: string;
};

export type Root = {
data: Data0;
statusCode: number;
};
```

It then generates an intermediate representation for the second file (500.json),
which looks something like this:

```typescript
export type Error0 = {
message: string;
};

export type Root = {
error: Error0;
statusCode: number;
};
```

Next it compares each field in the generated types recursively, producing the
simplest type for each field which accurately describes both input files. This
produces a final intermediate type representation, which can then be emitted to
the following TypeScript type definition:

```typescript
export type T1 = {
message: string;
};

export type T4 = {
message: string;
};

export type Root = {
data: T1 | null | undefined;
error: T4 | null | undefined;
statusCode: number;
};
```

This process can be repeated for an arbitrary number of input files.

## Motivation

This tool was constructed to help produce Rust type definitions for the
Minecraft datapack format. However, it is useful for a wide variety of
applications. The following are some potential use case:

1. Generating types for production no SQL databases, where the schema has varied
over time
2. Generating complex API response types where the underlying type is unknown
but a large amount of sample responses can be obtained

## Design decisions

### Avoiding unions

Envault prefers to avoid unions where they are not necessary. For example, given
these input files:

```jsonc
// 200.json
{
"statusCode": 200,
"data": {
"message": "Hello World!"
}
}
```

```jsonc
// 500.json
{
"statusCode": 500,
"error": {
"message": "An internal server error occurred"
}
}
```

It is possible to generate two distinct types which accurately describe the
provided data:

```typescript
// With union
export type T1 = {
message: string;
};

export type T4 = {
message: string;
};

export type Root =
| {
data: T1;
statusCode: number;
}
| {
error: T4;
statusCode: number;
};

// Without union
export type T1 = {
message: string;
};

export type T4 = {
message: string;
};

export type Root = {
data: T1 | null | undefined;
error: T4 | null | undefined;
statusCode: number;
};
```

In this case, entype will generate the type without a union. This is for two
reasons:

1. The type is more robust (a new input with both `data` and `error` would not
contradict the generated type)
2. A preference for unions would potentially generate many variants where there
are lots of optional fields (like in the Minecraft datapack format)

It is possible to make this configurable, and I would consider implementing such
functionality in future (or accepting a PR) if there is demand and a suitable
proposal for the API design.

## Supported formats and languages

Entype currently sypports JSON as an input format and Rust/Typescript as output
formats. Other output formats are trivial to implement and PRs are welcome. Some
minor refactoring and API changes are needed to support other input formats, but
PRs are also welcome to add support for these.

## Limitations

### Naming

Since entype has limited context, it will not generate descriptive type names.
It is generally recommended (though not required) to manually rename types after
generation.

### Performance

Entype is written in TypeScript and is not optimised for performance. This is a
deliberate decision, which was made for the following reasons:

1. It is already fast (see the benchmarks)
2. It is intended for one-off use

If you measure your input data in GigaBytes rather than MegaBytes, no guarantees
are made as to the performance.

## Benchmarks

The following are the results of running `deno bench --allow-read`:

```
benchmark time (avg) (min … max) p75 p99 p995
---------------------------------------------------------------------- -----------------------------
datapack/blockstates - no disk 9.94 ms/iter (9.56 ms … 10.7 ms) 10.07 ms 10.7 ms 10.7 ms
datapack/models - no disk 32.55 ms/iter (30.94 ms … 37.5 ms) 32.69 ms 37.5 ms 37.5 ms
datapack/blockstates 31.06 ms/iter (28.16 ms … 44.51 ms) 30.82 ms 44.51 ms 44.51 ms
datapack/models 110.64 ms/iter (102.68 ms … 129.42 ms) 114.19 ms 129.42 ms 129.42 ms
```

On the following system:

```
Processor AMD Ryzen 5 3600 6-Core Processor × 6
Memory 16 GiB
```

The `datapack/blockstates` directory contains approximately `800.2 kB` of JSON
files, while the `datapack/models` folder contains around `1.1 MB`.