Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/xtao-org/jsonhilo

Fast lossless JSON parse event streaming, in JavaScript.
https://github.com/xtao-org/jsonhilo

deno fast high-level high-performance javascript json jsonhilo lossless low-level minimal modular parser runtime-independent sax sax-parser streaming tao ultra-fast zero-dependency

Last synced: 7 days ago
JSON representation

Fast lossless JSON parse event streaming, in JavaScript.

Awesome Lists containing this project

README

        

logo

# JsonHilo.js

[![](https://img.shields.io/badge/📢%20blog%20post-darkgreen?style=flat-square)](https://djedr.github.io/posts/jsonhilo-2021-07-29.html) [![](https://shields.io/npm/dm/@xtao-org/jsonhilo?style=flat-square&logo=npm&labelColor=darkred&color=grey)](https://www.npmjs.com/package/@xtao-org/jsonhilo)

Minimal [lossless](#lossless) JSON parse event streaming, akin to [SAX](https://en.wikipedia.org/wiki/Simple_API_for_XML).

***

Handcrafted by Darius J Chuck.

Donate directly via Stripe
 
or
 
Buy Me a Coffee at ko-fi.com
 
Postaw mi kawÄ™ na buycoffee.to

***

[Fast](#fast), [modular](#modular), and dependency-free.

Provides two interfaces: a [**hi**gh-level](#jsonhigh) one and a [**lo**w-level](#jsonlow) one.

Written in [runtime-independent](#runtime-independent) JavaScript.

Works in [Deno](https://deno.land/), [Node.js](https://nodejs.org), and the browser.

## Status

Stable.

[Passes standards-compliance tests](#standards-compliant) and [performs well in benchmarks](#fast).

Battle-tested.

## Installation

### Node.js

An [npm package](https://www.npmjs.com/package/@xtao-org/jsonhilo) is available:

```
npm i @xtao-org/jsonhilo
```

### Deno and the browser

Import modules directly from [deno.land/x](https://deno.land/x):

```js
import {JsonHigh} from 'https://deno.land/x/[email protected]/mod.js'
```

Or from a CDN such as [jsDelivr](https://www.jsdelivr.com/):

```js
import {JsonHigh} from 'https://cdn.jsdelivr.net/gh/xtao-org/[email protected]/mod.js'
```

## Quickstart

See a basic example in [`demo/basic.js`](demo/basic.js), pasted below:

```js
import {JsonHigh} from '@xtao-org/jsonhilo'
const stream = JsonHigh({
openArray: () => console.log(''),
openObject: () => console.log(''),
closeArray: () => console.log(''),
closeObject: () => console.log(''),
key: (key) => console.log(`${key}`),
value: (value) => console.log(`${value}`),
})
stream.chunk('{"tuple": [null, true, false, 1.2e-3, "[demo]"]}')
```

This uses [the simplified high-level interface](#jsonhigh) built on top of the [more powerful low-level core](#jsonlow).

## Features

* Simple and minimal
* Dependency-free
* [Runtime-independent](#runtime-independent)
* [Lossless](#lossless)
* [Modular](#modular)
* [Fast](#fast)
* [Streaming-friendly](#streaming-friendly)
* [Optionally standards-compliant](#standards-compliant)
* [Unicode-compatible](#unicode-compatible)

## Runtime-independent

The library logic is written in modern JavaScript and relies upon some of its features, standard modules in particular.

Beyond that it does not use any runtime-specific features and should work in any *modern* JavaScript environment. It was tested in Deno, Node.js, and the browser.

That said, the primary target runtime is Deno, and tests depend on it.

## Lossless

Unlike any other known streaming JSON parser, JsonHilo provides a [low-level](#jsonlow) interface for *lossless* parsing, i.e. it is possible to recover the *exact* input, including whitespace and string escape sequences, from parser events.

This feature can be used to implement accurate translators from JSON to other representations (see [Rationale](#rationale)), syntax highlighters (demo below), JSON scanners that search for substrings in strings on-the-fly, without first loading them into memory, and more.

Highlight demo

Pictured above is the syntax highlighting demo: [demo/highlight.js](demo/highlight.js)

## Modular

The library is highly modular with [a fully independent core](#jsonlow), around which various adapters and extensions are built, including [an easy-to-use high-level interface](#jsonhigh).

## JsonLow

The core module is [**`JsonLow.js`**](JsonLow.js). It has no dependencies, so it can be used on its own. It is very minimal and optimized for maximum performance and accuracy, as well as minimum memory footprint. It provides the most fine-grained control over the parsing process. The events generated by the parser carry enough information to losslessly recreate the input exactly, including whitespace.

See [**JsonLow.d.ts**](JsonLow.d.ts) for type information and [demo/highlight.js](demo/highlight.js) for usage example.

## JsonHigh

[**`JsonHigh.js`**](JsonHigh.js) is the high-level module which provides a more convenient interface. It is composed of auxiliary modules and adapters built around the core. It is optimized for convenience and provides similar functionality and granularity to other streaming parsers, such as [Clarinet](https://github.com/dscape/clarinet) or [creationix/jsonparse](https://github.com/creationix/jsonparse).

See [**JsonHigh.d.ts**](JsonHigh.d.ts) for type information and [Quickstart](#quickstart) for usage example.

### Parameters

`JsonHigh` is called with an object which contains named event handlers that are invoked during parsing. All handlers are optional and described [below](#events).

### Return value

`JsonHigh` returns a stream object with two methods:

* `chunk` which accepts a JSON chunk to parse. It returns the stream object for chaining.
* `end` with no arguments which signals that the current JSON document is finished. If there is no error, it calls the corresponding `end` event handler, passing its return value to the caller.

### Events

There are 4 event handlers without arguments which indicate start and end of structures:

* `openArray`: an array started (`[`)
* `closeArray`: an array ended (`]`)
* `openObject`: an object started (`{`)
* `closeObject`: an object ended (`}`)

And 2 event handlers with one argument which capture primitives:

* `key`: an object's key ended. The argument of the handler contains the key as a JavaScript string.
* `value`: a primitive JSON value ended. The argument of the event contains the corresponding JavaScript value: `true`, `false`, `null`, a number, or a string.

Finally, there is the argumentless `end` event handler which is called by the `end` method of the stream to confirm that the parsed JSON document is complete and valid.

Note that an event handler won't be called if there is an error in the parsed JSON, see [error handling](#error-handling).

### Error handling

If there is an error when parsing a `chunk`, an `Error` is thrown, containing a serialized JSON object with details in the error message.

If there is an error at the `end`, that error is returned to the caller. The user-provided `end` event handler is not called, so it should not contain any [cleanup](#cleanup) code.

### Cleanup

To run cleanup code at the end of parsing a document regardless of whether there was an error or not, **don't put that code in the end handler**. Instead put it after `.end()`, like so:

```js
// ...
stream.end()
cleanup()
```

If you want to also handle an error, you can use the `isError` helper:

```js
import {isError} from '@xtao-org/jsonhilo'

// ...

const ret = stream.end()
if (isError(ret)) { handle(ret) } // handle error
cleanup()
```

If your error handler can throw, you can use `try-catch-finally`:

```js
import {isError} from '@xtao-org/jsonhilo'

// ...

const ret = stream.end()
try { if (isError(ret)) { handle(ret) } }
catch (e) { /* optional */ }
finally { cleanup() }
```

## Fast

Achieving optimal performance without sacrificing simplicity and correctness was a design goal of JsonHilo. This goal was realized and for applications without extreme performance requirements JsonHilo should be more than fast enough.

It may be worth noting however that using pure JavaScript for extremely performance-sensitive applications is ill-advised and that nothing can replace individual case-by-case benchmarks.

It is difficult to find a parser that can be sensibly compared with JsonHilo. The one that comes the closest and is fairly widely known is [Clarinet](https://github.com/dscape/clarinet). It is the only low-level streaming JSON parser featured on [JSON.org](https://www.json.org) and the fastest one I could find.

[xtao-org/jsonhilo-benchmarks](https://github.com/xtao-org/jsonhilo-benchmarks) contains simple benchmarks used to compare the performance of JsonHilo with Clarinet and [jq](https://stedolan.github.io/jq/) (a fast and versatile command-line JSON processor).

According to these benchmarks, for validating JSON (just parsing without any further processing) JsonHilo is the fastest, before jq, which is in turn faster than Clarinet. Overall for comparable tasks the low-level JsonHilo interface is up to 2x faster than Clarinet, whereas the high-level interface is on par.

Again, these results need to be taken with a grain of salt, and there is no replacement for individual benchmarks. Use whatever suits your case best. In most cases, relative performance should not be the only factor to take into account.

Factors which make a fair comparison between JsonHilo and Clarinet problematic are mentioned below.

### Differences between JsonHilo and Clarinet

The major differences that make the comparison of the two problematic are:

* [Clarinet is not fully ECMA-404-compliant](https://github.com/dscape/clarinet/issues/49), as measured by [JSON Parsing Test Suite by Nicolas Seriot](https://github.com/nst/JSONTestSuite) -- it accepts certain invalid JSON and rejects certain valid JSON. JsonHilo is designed to parse the JSON grammar correctly and so [can pass the ECMA-404-compliance test suite](#standards-compliant). JsonHilo is overall safer to use with unknown inputs -- it can very well be used as a validator.
* JsonHilo fundamentally operates on individual Unicode code points as opposed to strings, chunks, or characters. Performance-wise this may be an advantage or a disadvantage, depending on how the input is structured (it may need conversion).
* Even though low-level processing with JsonHilo may be overall significantly faster than Clarinet, the fact that the former does not use regular expressions to parse the input while the latter does may lead to a narrower performance gap between the two.
* JsonHilo is overall simpler in terms of code complexity, making it easier to adjust or audit. The code is also significantly smaller in size than Clarinet, even taking into account the optional high-level interfaces laid on top of the tiny core.
* JsonHilo's core is more low-level and amenable to extension.

## Streaming-friendly

By default the parser is streaming-friendly by accepting the following:

* Multiple consecutive top-level JSON values -- it can read [line-delimited JSON and concatenated JSON](https://en.wikipedia.org/wiki/JSON_streaming), e.g. [JSON Lines](https://jsonlines.org/), [ndjson](http://ndjson.org/). Whitespace-separated primitives are also supported.

* [Trailing commas](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Trailing_commas) -- a single trailing comma in an array or an object generates no errors.

* Zero-length or whitespace-only input -- generates no errors.

## Standards-compliant

The [streaming-friendly features](#streaming-friendly) can be supressed by [**`Ecma404.js`**](Ecma404.js), an adapter module which provides full [ECMA-404](https://www.ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf)/[RFC 8259](https://datatracker.ietf.org/doc/html/rfc8259) compliance.

This is confirmed by passing the [JSON Parsing Test Suite](https://github.com/nst/JSONTestSuite) by [Nicolas Seriot](https://github.com/nst), available under `test/JSONTestSuite`.

Tests can be run with Deno as follows:

```
deno test --allow-read
```

## Unicode-compatible

The [core logic](#jsonlow) operates on Unicode code points -- in line with spec -- rather than code units or characters.

## Rationale

Initially written to enable fast lossless translation between JSON and [Jevko](https://jevko.org), as no suitable JSON parser in JavaScript exists.

I decided to release this as a separate library, because I was tinkering with Deno and found that there was [no streaming JSON parser available at all for Deno](https://stackoverflow.com/questions/58070346/reading-large-json-file-in-deno).

## See also

[JsonStrum](https://github.com/xtao-org/jsonstrum) -- a high-level wrapper over JsonHilo which emits fully parsed objects and arrays.

## License

Released under the [MIT](LICENSE) license.

## Support this project

I prefer to share my creations for free. However living and creating without money is not possible for me. So I ask companies and people, who want and can, for support. Every symbolic cup of coffee counts!


Donate directly via Stripe
 
or
 
Buy Me a Coffee at ko-fi.com
 
Postaw mi kawÄ™ na buycoffee.to

## Paid support and online assistance

If you prefer, [you can get paid help and support, including direct online assistance, related to JsonHilo.js through Githelp.](https://githelp.app/repos/jsonhilo)

At the moment this is a limited opportunity to try an early version of Githelp.

***

tao-json-logo

A stand-alone part of the [TAO](https://xtao.org)-JSON project.

© 2024 [xtao.org](https://xtao.org)