https://github.com/Borewit/strtok3
Promise based streaming tokenizer
https://github.com/Borewit/strtok3
binary decoder decoding encoder encoding stream streaming-tokenizer tokenizer
Last synced: about 2 months ago
JSON representation
Promise based streaming tokenizer
- Host: GitHub
- URL: https://github.com/Borewit/strtok3
- Owner: Borewit
- License: mit
- Created: 2017-06-18T10:06:00.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2024-10-28T06:48:34.000Z (7 months ago)
- Last Synced: 2024-10-29T16:00:07.336Z (7 months ago)
- Topics: binary, decoder, decoding, encoder, encoding, stream, streaming-tokenizer, tokenizer
- Language: TypeScript
- Size: 3.11 MB
- Stars: 37
- Watchers: 3
- Forks: 15
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/Borewit/strtok3/actions/workflows/ci.yml)
[](https://github.com/Borewit/strtok3/actions/workflows/codeql.yml)
[](https://npmjs.org/package/strtok3)
[](https://npmcharts.com/compare/strtok3,token-types?start=1200&interval=30)
[](https://deepscan.io/dashboard#view=project&tid=5165&pid=8526&bid=103329)
[](https://snyk.io/test/github/Borewit/strtok3?targetFile=package.json)
[](https://www.codacy.com/app/Borewit/strtok3?utm_source=github.com&utm_medium=referral&utm_content=Borewit/strtok3&utm_campaign=Badge_Grade)
# strtok3A promise based streaming [*tokenizer*](#tokenizer-object) for [Node.js](http://nodejs.org) and browsers.
The `strtok3` module provides several methods for creating a [*tokenizer*](#tokenizer-object) from various input sources.
Designed for:
* Seamless support in streaming environments.
* Efficiently decode binary data, strings, and numbers.
* Reading [predefined](https://github.com/Borewit/token-types) or custom tokens.
* Offering [*tokenizers*](#tokenizer-object) for reading from [files](#method-strtok3fromfile), [streams](#fromstream-function) or [Uint8Arrays](#frombuffer-function).### Features
`strtok3` can read from:
* Files, using a file path as input.
* Node.js [streams](https://nodejs.org/api/stream.html).
* [Buffer](https://nodejs.org/api/buffer.html) or [Uint8Array](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array).
* HTTP chunked transfer provided by [@tokenizer/http](https://github.com/Borewit/tokenizer-http).
* [Amazon S3](https://aws.amazon.com/s3) chunks with [@tokenizer/s3](https://github.com/Borewit/tokenizer-s3).## Installation
```sh
npm install strtok3
```### Compatibility
Starting with version 7, the module has migrated from [CommonJS](https://en.wikipedia.org/wiki/CommonJS) to [pure ECMAScript Module (ESM)](https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c).
The distributed JavaScript codebase is compliant with the [ECMAScript 2020 (11th Edition)](https://en.wikipedia.org/wiki/ECMAScript_version_history#11th_Edition_%E2%80%93_ECMAScript_2020) standard.Requires a modern browser, Node.js (V8) ≥ 18 engine or Bun (JavaScriptCore) ≥ 1.2.
For TypeScript CommonJs backward compatibility, you can use [load-esm](https://github.com/Borewit/load-esm).
> [!NOTE]
> This module requires a [Node.js ≥ 16](https://nodejs.org/en/about/previous-releases) engine.
> It can also be used in a browser environment when bundled with a module bundler.## Support the Project
If you find this project useful and would like to support its development, consider sponsoring or contributing:- [Become a sponsor to Borewit](https://github.com/sponsors/Borewit)
- Buy me a coffee:
## API Documentation
### strtok3 methods
Use one of the methods to instantiate an [*abstract tokenizer*](#tokenizer-object):
- [fromFile](#fromfile-function)*
- [fromStream](#fromstream-function)*
- [fromWebStream](#fromwebstream-function)
- [fromBuffer](#frombuffer-function)> **_NOTE:_** * `fromFile` and `fromStream` only available when importing this module with Node.js
All methods return a [`Tokenizer`](#tokenizer-object), either directly or via a promise.
#### `fromFile` function
Creates a [*tokenizer*](#tokenizer-object) from a local file.
```ts
function fromFile(sourceFilePath: string): Promise
```| Parameter | Type | Description |
|----------------|----------|----------------------------|
| sourceFilePath | `string` | Path to file to read from |> [!NOTE]
> - Only available for Node.js engines
> - `fromFile` automatically embeds [file-information](#file-information)Returns, via a promise, a [*tokenizer*](#tokenizer-object) which can be used to parse a file.
```js
import * as strtok3 from 'strtok3';
import * as Token from 'token-types';(async () => {
const tokenizer = await strtok3.fromFile("somefile.bin");
try {
const myNumber = await tokenizer.readToken(Token.UINT8);
console.log(`My number: ${myNumber}`);
} finally {
tokenizer.close(); // Close the file
}
})();
```#### `fromStream` function
Creates a [*tokenizer*](#tokenizer-object) from a Node.js [readable stream](https://nodejs.org/api/stream.html#stream_class_stream_readable).
```ts
function fromStream(stream: Readable, options?: ITokenizerOptions): Promise
```| Parameter | Optional | Type | Description |
|-----------|-----------|-------------------------|--------------------------|
| stream | no | [Readable](https://nodejs.org/api/stream.html#stream_class_stream_readable) | Stream to read from |
| fileInfo | yes | [IFileInfo](#IFileInfo) | Provide file information |Returns a Promise providing a [*tokenizer*](#tokenizer-object).
> [!NOTE]
> - Only available for Node.js engines#### `fromWebStream` function
Creates [*tokenizer*](#tokenizer-object) from a [WHATWG ReadableStream](https://nodejs.org/api/webstreams.html#web-streams-api).
```ts
function fromWebStream(webStream: AnyWebByteStream, options?: ITokenizerOptions): ReadStreamTokenizer
```| Parameter | Optional | Type | Description |
|----------------|-----------|--------------------------------------------------------------------------|------------------------------------|
| readableStream | no | [ReadableStream](https://nodejs.org/api/webstreams.html#web-streams-api) | WHATWG ReadableStream to read from |
| fileInfo | yes | [IFileInfo](#IFileInfo) | Provide file information |Returns a Promise providing a [*tokenizer*](#tokenizer-object)
```js
import strtok3 from 'strtok3';
import * as Token from 'token-types';strtok3.fromWebStream(readableStream).then(tokenizer => {
return tokenizer.readToken(Token.UINT8).then(myUint8Number => {
console.log(`My number: ${myUint8Number}`);
});
});
```#### `fromBuffer()` function
Create a tokenizer from memory ([Uint8Array](https://nodejs.org/api/buffer.html)).
```ts
function fromBuffer(uint8Array: Uint8Array, options?: ITokenizerOptions): BufferTokenizer
```| Parameter | Optional | Type | Description |
|------------|----------|--------------------------------------------------|----------------------------------------|
| uint8Array | no | [Uint8Array](https://nodejs.org/api/buffer.html) | Uint8Array or Buffer to read from |
| fileInfo | yes | [IFileInfo](#IFileInfo) | Provide file information |Returns a Promise providing a [*tokenizer*](#tokenizer-object).
```js
import * as strtok3 from 'strtok3';const tokenizer = strtok3.fromBuffer(buffer);
tokenizer.readToken(Token.UINT8).then(myUint8Number => {
console.log(`My number: ${myUint8Number}`);
});
```### `Tokenizer` object
The *tokenizer* is an abstraction of a [stream](https://nodejs.org/api/stream.html), file or [Uint8Array](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array), allowing _reading_ or _peeking_ from the stream.
It can also be translated in chunked reads, as done in [@tokenizer/http](https://github.com/Borewit/tokenizer-http);#### Key Features:
- Supports seeking within the stream using `tokenizer.ignore()`.
- Offers `peek` methods to preview data without advancing the read pointer.
- Maintains the read position via tokenizer.position.#### Tokenizer functions
_Read_ methods advance the stream pointer, while _peek_ methods do not.
There are two kind of functions:
1. *read* methods: used to read a *token* of [Buffer](https://nodejs.org/api/buffer.html) from the [*tokenizer*](#tokenizer-object). The position of the *tokenizer-stream* will advance with the size of the token.
2. *peek* methods: same as the read, but it will *not* advance the pointer. It allows to read (peek) ahead.#### `readBuffer` function
Read data from the _tokenizer_ into provided "buffer" (`Uint8Array`).
`readBuffer(buffer, options?)````ts
readBuffer(buffer: Uint8Array, options?: IReadChunkOptions): Promise;
```| Parameter | Type | Description |
|------------|----------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| buffer | [Buffer](https://nodejs.org/api/buffer.html) | Uint8Array | Target buffer to write the data read to |
| options | [IReadChunkOptions](#ireadchunkoptions) | An integer specifying the number of bytes to read |Return promise with number of bytes read.
The number of bytes read maybe if less, *mayBeLess* flag was set.#### `peekBuffer` function
Peek (read ahead), from [*tokenizer*](#tokenizer-object), into the buffer without advancing the stream pointer.
```ts
peekBuffer(uint8Array: Uint8Array, options?: IReadChunkOptions): Promise;
```| Parameter | Type | Description |
|------------|-----------------------------------------|-----------------------------------------------------|
| buffer | Buffer | Uint8Array | Target buffer to write the data read (peeked) to. |
| options | [IReadChunkOptions](#ireadchunkoptions) | An integer specifying the number of bytes to read. | |Return value `Promise` Promise with number of bytes read. The number of bytes read maybe if less, *mayBeLess* flag was set.
#### `readToken` function
Read a *token* from the tokenizer-stream.
```ts
readToken(token: IGetToken, position: number = this.position): Promise
```| Parameter | Type | Description |
|------------|-------------------------|---------------------------------------------------------------------------------------------------------------------- |
| token | [IGetToken](#IGetToken) | Token to read from the tokenizer-stream. |
| position? | number | Offset where to begin reading within the file. If position is null, data will be read from the current file position. |Return value `Promise`. Promise with number of bytes read. The number of bytes read maybe if less, *mayBeLess* flag was set.
#### `peek` function
Peek a *token* from the [*tokenizer*](#tokenizer-object).
```ts
peekToken(token: IGetToken, position: number = this.position): Promise
```| Parameter | Type | Description |
|------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------|
| token | [IGetToken](#IGetToken) | Token to read from the tokenizer-stream. |
| position? | number | Offset where to begin reading within the file. If position is null, data will be read from the current file position. |Return a promise with the token value peeked from the [*tokenizer*](#tokenizer-object).
#### `readNumber` function
Peek a numeric [*token*](#token) from the [*tokenizer*](#tokenizer-object).
```ts
readNumber(token: IToken): Promise
```| Parameter | Type | Description |
|------------|---------------------------------|----------------------------------------------------|
| token | [IGetToken](#IGetToken) | Numeric token to read from the tokenizer-stream. |Returns a promise with the decoded numeric value from the *tokenizer-stream*.
#### `ignore` function
Advance the offset pointer with the token number of bytes provided.
```ts
ignore(length: number): Promise
```| Parameter | Type | Description |
|------------|--------|----------------------------------------------------------------------|
| ignore | number | Numeric of bytes to ignore. Will advance the `tokenizer.position` |Returns a promise with the decoded numeric value from the *tokenizer-stream*.
#### `close` function
Clean up resources, such as closing a file pointer if applicable.#### `Tokenizer` attributes
- `fileInfo`
Optional attribute describing the file information, see [IFileInfo](#IFileInfo)
- `position`
Pointer to the current position in the [*tokenizer*](#tokenizer-object) stream.
If a *position* is provided to a _read_ or _peek_ method, is should be, at least, equal or greater than this value.### `IReadChunkOptions` interface
Each attribute is optional:
| Attribute | Type | Description |
|-----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| length | number | Requested number of bytes to read. |
| position | number | Position where to peek from the file. If position is null, data will be read from the [current file position](#attribute-tokenizerposition). Position may not be less then [tokenizer.position](#attribute-tokenizerposition) |
| mayBeLess | boolean | If and only if set, will not throw an EOF error if less then the requested *mayBeLess* could be read. |Example usage:
```js
tokenizer.peekBuffer(buffer, {mayBeLess: true});
```### `IFileInfo` interface
Provides optional metadata about the file being tokenized.
| Attribute | Type | Description |
|-----------|---------|---------------------------------------------------------------------------------------------------|
| size | number | File size in bytes |
| mimeType | number | [MIME-type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types) of file. |
| path | number | File path |
| url | boolean | File URL |### `Token` object
The *token* is basically a description what to read form the [*tokenizer-stream*](#tokenizer-object).
A basic set of *token types* can be found here: [*token-types*](https://github.com/Borewit/token-types).A token is something which implements the following interface:
```ts
export interface IGetToken {/**
* Length in bytes of encoded value
*/
len: number;/**
* Decode value from buffer at offset
* @param buf Buffer to read the decoded value from
* @param off Decode offset
*/
get(buf: Uint8Array, off: number): T;
}
```
The *tokenizer* reads `token.len` bytes from the *tokenizer-stream* into a Buffer.
The `token.get` will be called with the Buffer. `token.get` is responsible for conversion from the buffer to the desired output type.### Working with Web-API readable stream
To convert a [Web-API readable stream](https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultReader) into a [Node.js readable stream]((https://nodejs.org/api/stream.html#stream_readable_streams)), you can use [readable-web-to-node-stream](https://github.com/Borewit/readable-web-to-node-stream) to convert one in another.```js
import { fromWebStream } strtok3 from 'strtok3';
import { ReadableWebToNodeStream } from 'readable-web-to-node-stream';(async () => {
const response = await fetch(url);
const readableWebStream = response.body; // Web-API readable stream
const webStream = new ReadableWebToNodeStream(readableWebStream); // convert to Node.js readable streamconst tokenizer = fromWebStream(webStream); // And we now have tokenizer in a web environment
})();
```## Dependencies
The diagram below illustrates the primary dependencies of `strtok3`:
```mermaid
graph TD;
S(strtok3)-->P(peek-readable)
S(strtok3)-->TO("@tokenizer/token")
```
- [peek-readable](https://github.com/Borewit/peek-readable): Manages reading operations with peeking capabilities, allowing data to be previewed without advancing the read pointer.
- [@tokenizer/token](https://github.com/Borewit/tokenizer-token): Provides token definitions and utilities used by `strtok3` for interpreting binary data.## Licence
This project is licensed under the [MIT License](LICENSE.txt). Feel free to use, modify, and distribute as needed.