https://github.com/rootslab/auntie
Auntie, my dear ultra-fast module for untying/splitting/counting a stream of data by a chosen separator sequence.
https://github.com/rootslab/auntie
boyer-moore count-lines csv csv-parser csv-stream nodejs parser pattern-search split-data splitter streaming-parser tsv
Last synced: 11 months ago
JSON representation
Auntie, my dear ultra-fast module for untying/splitting/counting a stream of data by a chosen separator sequence.
- Host: GitHub
- URL: https://github.com/rootslab/auntie
- Owner: rootslab
- License: mit
- Created: 2017-12-13T01:47:16.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-03-07T21:15:34.000Z (almost 8 years ago)
- Last Synced: 2025-03-07T20:38:20.116Z (12 months ago)
- Topics: boyer-moore, count-lines, csv, csv-parser, csv-stream, nodejs, parser, pattern-search, split-data, splitter, streaming-parser, tsv
- Language: JavaScript
- Homepage:
- Size: 227 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
- License: LICENSE
Awesome Lists containing this project
README
### Auntie
[](https://www.npmjs.org/package/auntie)
[](https://www.codacy.com/public/44gatti/auntie)
[](https://codeclimate.com/github/rootslab/auntie)
[](https://github.com/rootslab/auntie#mit-license)

[](http://travis-ci.org/rootslab/auntie)
[](https://david-dm.org/rootslab/auntie)
[](https://david-dm.org/rootslab/auntie#info=devDependencies)
[](http://npm-stat.com/charts.html?package=auntie)
[](http://npm-stat.com/charts.html?package=auntie)
[](http://npm-stat.com/charts.html?package=auntie)
[](https://nodei.co/npm/auntie/)
> __Auntie__, _my dear_ __ultra-fast__ module for __untying/splitting/counting__
> a stream of data by a __chosen separator sequence__.
> It uses __[Bop](https://github.com/rootslab/bop)__ under the hood, a **_Boyer-Moore_** parser,
> optimized for sequence lengths up to 255 bytes.
### Table of Contents
- __[Install](#install)__
- __[Run Tests](#run-tests)__
- __[Run Benchmarks](#run-benchmarks)__
- __[Constructor](#constructor)__
- __[Properties](#properties)__
- __[Methods](#methods)__
- __[count](#auntiecount)__
- __[dist](#auntiedist)__
- __[do](#auntiedo)__
- __[flush](#auntieflush)__
- __[set](#auntieset)__
- __[comb](#auntiecomb)__
- __[Events](#events)__
- __[Examples](#examples)__
- __[split lines (streaming) from a CSV file](#split-lines-from-a-csv-file-crlf)__
- __[count lines from a file](#count-lines-from-a-file-crlf)__
- __[snap event and collect](#snap-event-and-collect-crlf)__
- __[ToDo](/ToDo.md)__
- __[MIT License](#mit-license)__
------------------------------------------------------------------------------
### Install
```bash
$ npm install auntie [-g]
```
> __require__:
```javascript
const Auntie = require( 'auntie' );
```
### Run Tests
> __to run all test files, install devDependencies:__
```bash
$ cd auntie/
# install or update devDependencies
$ npm install
# run tests
$ npm test
```
> __to execute a single test file, simply do__:
```bash
$ node test/file-name.js
```
> __output example and running time__:
```bash
...
- current path is 'test'.
- time elapsed: 106.596 secs.
26 test files were loaded.
26 test files were launched.
1272671 assertions succeeded.
```
### Run Benchmarks
```bash
$ cd auntie/
$ npm run bench
```
> __to execute a single bench file, simply do__:
```bash
$ node bench/file-name.js
```
------------------------------------------------------------------------------
### Constructor
> Arguments between [ ] are optional.
```javascript
Auntie( [ Buffer | String | Number sequence ] )
```
> or
```javascript
new Auntie( [ Buffer | String | Number sequence ] )
```
> __NOTE__: default is the `CRLF sequence \r\n`.
### Properties
> __NOTE__: do not mess up with these properties.
##### The current sequence for splitting data
```javascript
Auntie.seq : Buffer
```
##### the Boyer-Moore parser, under the hood.
```javascript
Auntie.bop : Bop
```
##### a Boyer-Moore parser, to search for generic (sub)sequences
```javascript
Auntie.gbop : Bop
```
##### the remaining data, without any match found.
```javascript
Auntie.snip : Buffer
```
##### the remaining data, used for counting.
```javascript
Auntie.csnip : Buffer
```
##### the current number of matches, min/max distance, remaining bytes.
```javascript
Auntie.cnt : Array
```
------------------------------------------------------------------------------
### Methods
| name | description |
|:--------------------------|:---------------------------------------------------------------------------------|
| __[count](#auntiecount)__ | `count (only) how many times the sequence appears in the current data.` |
| __[dist](#auntiedist)__ | `count occurrences, min and max distance between sequences and remaining bytes.` |
| __[do](#auntiedo)__ | `split data or a stream of data by the current sequence.` |
| __[flush](#auntieflush)__ | `flush the remaining data, resetting internal state/counters.` |
| __[set](#auntieset)__ | `set a new sequence for splitting data.` |
| __[comb](#auntiecomb)__ | `search a char or a sequence into the current data.` |
> Arguments between [ ] are optional.
#### Auntie.count
> ##### the fastest/lightest way to count how many times the sequence appears in the current data.
```javascript
/*
* it returns an Array with the current number of occurrences.
*
* NOTE: it saves the minimum necessary data that does not contains
* the sequence, for the next #count call with fresh data (to check
* for single occurrences between 2 chunks of data.
*/
'count' : function ( Buffer data ) : Array
```
#### Auntie.dist
> ##### count occurrences, min and max distance between sequences and remaining bytes.
```javascript
/*
* it returns an Array with:
* - the current number of occurrences
* - the minimum distance, in bytes, between any 2 sequences
* - the maximum distance, in bytes, between any 2 sequences
* - the remaining bytes to the end of data (without any matching sequence)
*
* NOTE:
* - also the distance from index 0 to the first match will be considered
* - it saves the remaining data that does not contains the sequence,
* for the next #dist call with fresh data, to check for occurrences
* between chunks).
*/
'dist' : function ( Buffer data ) : Array
```
#### Auntie.do
> ##### split data or a stream of data by the current sequence
```javascript
/*
* if collect is true, it returns an Array of data slices; otherwise, it
* emits a 'snap' event for every slice; then, after having finished to
* parse data, it emits a 'snip' event, with the remaining data that does
* not contain the sequence ( the current Auntie.snip property ).
*
* NOTE: it saves the remaining data that does not contains the
* sequence, for the next #do call on fresh data (to check for
* occurrences between chunks).
*/
'do' : function ( Buffer data [, Boolean collect ] ) : [ Array ]
```
#### Auntie.flush
> ##### flush the remaining data, resetting internal state/counters
```javascript
/*
* if collect is true it returns a Buffer, otherwise it emits
* a 'snip' event with data. Obviously the snip doesn't contain
* the sequence (no match). It is equal to get and reset the
* internal me.snip property.
*/
'flush' : function ( [ Boolean collect ] ) : [ Buffer ]
```
#### Auntie.set
> ##### set a new sequence for splitting data.
```javascript
// default sequence is '\r\n' or CRLF sequence.
'set' : function ( [ Buffer | String | Number sequence ] ) : Auntie
```
#### Auntie.comb
> ##### search for a char or a sequence into the current data.
```javascript
/*
* parse current data for a generic sequence. It returns an Array of indexes.
* NOTE: it doesn't affect the current streaming parser and it doesn't save
* any data. It simply parses a chunk of data for the specified sequence,
* optionally from a starting index and limiting results to a specified number
* of occurrences (like Bop.parse does).
*/
'comb' : function ( Buffer | String seq, Buffer data [, Number from [, Number limit ] ] ) : Array
```
------------------------------------------------------------------------------
### Events
> Auntie emits only __2__ types of events: __`snap`__ and __`snip`__.
##### !snap a result.
```javascript
'snap' : function ( Buffer result )
```
##### !snip current remaining data (with no match found).
```javascript
'snip' : function ( Buffer result )
```
> __NOTE__: if the '_collect_' switch for the __do__/__flush__ was set (true),
> then no event will be emitted.
------------------------------------------------------------------------------
### Examples
#### split lines from a CSV file (CRLF):
> - __[do, comb: streaming data](example/auntie-csv-split-lines-example.js)__
#### count lines from a file (CRLF):
> - __[count: sync load](example/auntie-count-sync-load-example.js)__
> - __[dist: sync load](example/auntie-dist-sync-load-example.js)__
> - __[count: streaming data](example/auntie-count-async-stream-example.js)__
> - __[dist: streaming data](example/auntie-dist-async-stream-example.js)__
#### snap event and collect (CRLF):
> - __[do snap: streaming data](example/auntie-do-snap-event-example.js)__
> - __[do collect: streaming data](example/auntie-do-collect-results-example.js)__
> See __[ All examples](example/)__.
### MIT License
> Copyright (c) 2017-present < Guglielmo Ferri : 44gatti@gmail.com >
> Permission is hereby granted, free of charge, to any person obtaining
> a copy of this software and associated documentation files (the
> 'Software'), to deal in the Software without restriction, including
> without limitation the rights to use, copy, modify, merge, publish,
> distribute, sublicense, and/or sell copies of the Software, and to
> permit persons to whom the Software is furnished to do so, subject to
> the following conditions:
> __The above copyright notice and this permission notice shall be
> included in all copies or substantial portions of the Software.__
> THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.