https://github.com/igorski/binarcular

Read/write binary data into/from JSON structures, with all data types converted to JS friendly values. Data can be searched, sliced into separate meaningful structures and files can be generated and downloaded, all in the web browser.
https://github.com/igorski/binarcular

binary-file-search client-side es6-module file fileparser filereader filesearch filestream javascript struct typed-struct

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/igorski/binarcular
Owner: igorski
Created: 2020-09-30T17:57:46.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2020-10-06T17:06:56.000Z (over 4 years ago)
Last Synced: 2025-03-14T16:50:06.245Z (3 months ago)
Topics: binary-file-search, client-side, es6-module, file, fileparser, filereader, filesearch, filestream, javascript, struct, typed-struct
Language: JavaScript
Homepage:
Size: 461 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Binarcular

A library that allows you to read/write the contents of a binary file into/from a JSON Object,
taking care of all data type conversion using JavaScript-friendly values. You can search for data
by value, slice blocks into separate, meaningful structures or generate a binary downloadable
file, all inside your browser.

Practical use cases are:

* Validating whether a file header contains the appropriate description, by comparing
its parsed data to the respective data types
* Scanning a file for specific metadata to determine the location of other meaningful data
* Creating a binary file in your browser, without having to repeatedly write meaningless byte
values in sequence

See API and Example below.

## Compatibility

_binarcular_ should work fine on Internet Explorer 10 and up. You can verify
support at runtime by querying the result of the _isSupported()_-method:

```
import { isSupported } from 'binarcular';
if ( isSupported( optRequire64bitConversion = false ) ) {
...do stuff!
} else {
...do other, less cool stuff! Actually, not cool at all!! >=/
}
```

NOTE: if you require support for 64-bit types there are [additional requirements](https://caniuse.com/?search=bigint). Pass boolean _true_ for optional argument _optRequire64bitConversion_ to determine whether the environment supports 64-bit conversion.

## Installation

You can get it via NPM:

```
npm install binarcular
```

### Project integration

The library is compatible with CommonJS / ES6 modules or can be included in a document
using AMD/RequireJS. See the contents of the _/dist/_ folder and include as your project sees fit.

## API

The module exports the following:

```
import {

isSupported: fn( optRequire64bitConversion = false ),
types: Object,
parse: async fn( dataSource, structureDefinition, optReadOffset = 0 ),
seek: async fn( uint8Array, searchStringOrByteArray, optReadOffset = 0 ),
write: async fn( uint8Array, structureDefinition, dataToWrite, optWriteOffset = 0 )
fileToByteArray: async fn( file, optSliceOffset = 0, optSliceSize = file.size )
byteArrayToFile: fn( uint8Array, filename, optMimeType = 'application/octet-stream' )

} from 'binarcular';
```

We'll look into each of these below:

### Reading a chunk of data into an Object of a specific structure type

Handled by the _parse_ method:

```
async parse( dataSource, structureDefinition, optReadOffset = 0 )
```

Where:

* _dataSource_ is the file to parse (can be either _File_, _Blob_, _Uint8Array_ or (base64 encoded) _String_)
* _structureDefinition_ is an Object defining a data structure ([as described here](#define-a-structure))
* _optReadOffset_ is a numerical index describing where in the file's ByteArray reading should start
this defaults to 0 to start at the beginning of the file.

When the Promise resolves, the result is the following structure:

```
{
data: Object,
end: Number,
error: Boolean,
byteArray: Uint8Array
}
```

If all has been read successfully, _data_ is an Object that follows
the structure of _wavHeader_ and is populated with the actual file data.

_end_ describes at what offset in given file the structure's definition has ended.
This can be used for subsequent read operations where different data types are
extracted from the binary data.

If _error_ is true, this indicates that something went wrong during parsing. The _data_
Object will be populated with all data that could've been harvested up until the error occurred.
This allow you to harvest what you can from corrupted files.

#### A note on using Uint8Array as dataSource

You can see that another property is defined in the result, namely _byteArray_.
If the _dataSource_ provided to the parse method was a _Uint8Array_ which you intend to
reuse inside your project, be sure to reassign your byteArray reference to the
returned instance.

The rationale here is that for minimal overhead, the ownership of the ByteArray's binary
content is transferred during the read operations. _You will not be able to perform
any actions on your ByteArray without updating the reference_.

### Looking for a specific entry in a file

If you are working with a file where the content of interest is preceded by some
metadata at an arbitrary point, it makes sense to first look for this metadata
declaration so you know from where you can retrieve the actual data of interest.

For this purpose you can use _seek_:

```
async seek( uint8Array, searchStringOrByteArray, optReadOffset = 0 )
```

where:

* _uint8Array_ is the ByteArray containing the binary data.
* _searchStringOrByteArray_ can be either a String (in case the meta data is a
character sequence) or a Uint8Array holding a byte sequence that functions as the "search query".
* _optReadOffset_ determines the offset within the data from where to start searching, this
defaults to 0 to read from the start.

The method returns a numerical index at which the data was found or _Infinity_
if no match were found.

### Writing JSON as binary content

If you have a JSON structure that you wish to write into a binary file, you can do
so using _write_:

```
async write( uint8Array, structureDefinition, dataToWrite, optWriteOffset = 0 )
```

where:

* _uint8Array_ is the ByteArray containing the binary data.
* _structureDefinition_ is an Object defining a data structure ([as described here](#define-a-structure))
* _dataToWrite_ is an Object following the data structure, except the value here is the
data you wish to write in the binary file.
* _optWriteOffset_ is the index at which data will be written. This defaults to _0_ to
start writing at the beginning of the file. Data will be written for the length of
given _structureDefinition_, all existing data beyond this point will remain unchanged.

The result of this operation is the following:

```
{
data: Object,
end: Number,
error: Boolean,
byteArray: Uint8Array
}
```

Where _byteArray_ should replace the reference of the _byteArray_ you passed into
the method. This ByteArray contains the original data except that the data block starting
at requested _optWriteOffset_ has been replaced with the binary equivalent of the _dataToWrite_-Object.
All data beyond the size of the written data block remains unchanged.

### Converting a File reference to a ByteArray

```
async fileToByteArray( fileReference, optSliceOffset = 0, optSliceSize = fileReference.size )
```

where:

* _fileReference_ is the File (or Blob) of which the contents should be read into a _Uint8Array_.
* _optSliceOffset_ is the optional offset from where to read the data, defaults to 0 to
start from the beginning.
* _optSliceSize_ is the optional size of the resulting ByteArray. This defaults to the
size of the file to read the file in its entirety. When using a custom _optSliceOffset_
overflow checking is performed to prevent reading out of the file boundaries.

### Converting a ByteArray to a File

```
byteArrayToFile( byteArray, filename, optMimeType = 'application/octet-stream' )
```

This will generate a download of given _filename_, containing the data of
_byteArray_ as its content using given _optMimeType_.

To prevent blocking the download, this should be called directly from a click handler.

## Example

Let's say we want to read the binary data of a well known proprietary format.
First up we will get to...

### Define a structure

Defining a structure is nothing more than declaring an Object where the keys
define names meaningful to your purpose and the values consist of Strings describing:

* one of the available type enumerations (the names of the imported types are equal to their value).
* optional Array declaration where by adding a numerical value between brackets _[n]_, will
make the value an Array of given length _n_.
* optional modifier defining the endianness of the file's byte order, separated by a pipeline
(either _|BE_ for Big Endian or _|LE_ for Little Endian). When unspecified, the
endianness of the clients system is used (assuming the file has been encoded on/by a similar
system, which usually means Little Endian these days).

An example structure that defines the [header of a .WAV file](http://soundfile.sapp.org/doc/WaveFormat)
would look like:

```
const wavHeader = {
type: 'CHAR[4]',
size: 'INT32|LE',
format: 'CHAR[4]',
formatName: 'CHAR[4]',
formatLength: 'INT32|LE',
audioFormat: 'INT16|LE',
channelAmount: 'INT16|LE',
sampleRate: 'INT32|LE',
bytesPerSecond: 'INT32|LE',
blockAlign: 'INT16|LE',
bitsPerSample: 'INT16|LE',
dataChunkId: 'CHAR[4]',
dataChunkSize: 'INT32|LE'
};
```

Note that the order of the keys (and more importantly: their type definition) should match
the order of the values as described the particular file's type!

#### A teeny tiny note on Endianness

Note that specifying endianness can be omitted if you're certain that the file's
encoding is equal to that of the platform you will be parsing the file on (most likely
only Big Endianness will require an explicit definition). _And I hope you will never
be in the unfortunate situation where you work with a file that uses different
endianness for different blocks!_

#### Back to talking types

All available data types are listed in the _{ types }_ export. Note that definitions
for _CHAR_ will return as a String. If you want an 8-bit integer/byte value, use
_BYTE_ or _INT8_ instead.

We can now proceed to read the file:

```
import { parse } from 'binarcular';

async function readWaveHeader( fileReference ) {
const { data, end, error, byteArray } = await parse( fileReference, wavHeader, 0 );

console.log( data ); // will contain the properties of a WAV file header
console.log( end ); // will describe the end offset of the header
console.log( error ); // when true, a file reading error occurred
}
```

You can also view the [demo](https://htmlpreview.github.io/?https://github.com/igorski/binarcular/blob/master/dist/index.html) provided in this repository's _example.html_ file, which
parses .WAV files and provides advanced examples using seeking, slicing and error
correction before finally providing you with the instruction on how to extract the
meaningful data from the file.

## Performance

Depending on the size of the files you're working with, memory allocation can become a problem.

The parser will only read the block that is requested (e.g. starting from the
requested offset and only for the size of the requested _structureDefinition_) and
should thus be light on resources. Additionally, all read operations happen in a
dedicated Web Worker which keeps your main application responsive (you can safely
parse several hundred megabytes of data without blocking your UI).

Depending on your use case, it helps to take the following guidelines into consideration:

* Use base64 _only when you have no choice_ as a base64 String describes the
file _in its entirety_. Also, the way JavaScript handles Strings is by
allocating the entire value (and not by reference!) whenever you assign
it to a new variable.
* If you intend to do multiple reads on the same file (for instance: first reading
its header to determine where in the file the meaningful content begins) it
is recommended to use the _fileToByteArray()_-method to create a single
reusable _Uint8Array_. This also makes sense if you need to read the file in its entirety.

## Build instructions

In case you want to aid in development of the library:

The project dependencies are maintained by NPM, you can resolve them using:

```
npm install
```

You can develop (and test against the example app by navigating to _http://localhost:8080_) by running:

```
npm run dev
```

To create a production build:

```
npm run build
```

After which a folder _dist/_ is created which contains the prebuilt AMD/RequireJS
and CommonJS/ES module libraries (as well as the example application).

The source code is transpiled to ES5 for maximum browser compatibility.

## Unit testing

Unit tests are run via jest, you can run the tests by running:

```
npm run test
```

Unit tests go in the _./test_-folder. The file name for a unit test must be equal to the file it is testing, but contain the suffix ".spec", e.g. _functions.js_ should have a test file _functions.spec.js_.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/igorski/binarcular

Awesome Lists containing this project

README