https://github.com/rootslab/bop
Bop is a very fast Boyer-Moore parser/matcher for String or Buffer patterns.
https://github.com/rootslab/bop
boyer-moore boyer-moore-algorithm parser pattern-matching search-algorithm
Last synced: 8 months ago
JSON representation
Bop is a very fast Boyer-Moore parser/matcher for String or Buffer patterns.
- Host: GitHub
- URL: https://github.com/rootslab/bop
- Owner: rootslab
- License: mit
- Created: 2012-12-03T17:39:55.000Z (about 13 years ago)
- Default Branch: master
- Last Pushed: 2025-03-14T16:52:06.000Z (11 months ago)
- Last Synced: 2025-03-14T17:38:19.512Z (11 months ago)
- Topics: boyer-moore, boyer-moore-algorithm, parser, pattern-matching, search-algorithm
- Language: JavaScript
- Homepage:
- Size: 76.2 KB
- Stars: 15
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
- License: LICENSE
Awesome Lists containing this project
README
### Bop
[](https://www.npmjs.org/package/bop)
[](https://www.codacy.com/public/44gatti/bop)
[](https://codeclimate.com/github/rootslab/bop)
[](https://github.com/rootslab/bop#mit-license)

[](http://travis-ci.org/rootslab/bop)
[](https://david-dm.org/rootslab/bop)
[](https://david-dm.org/rootslab/bop#info=devDependencies)
[](http://npm-stat.com/charts.html?package=bop)

[](http://npm-stat.com/charts.html?package=bop)
[](https://nodei.co/npm/bop/)
* __Bop__ is a __very fast Boyer-Moore parser__ for string or buffer patterns.
* It is optimized for using with pattern strings/buffers <= 255 bytes.
* It is __ideal__, for example, to parse streams like __multipart/form-data__ ones, in which pattern/boundary length < ~70 bytes).
### Main features
> Given a m-length pattern and n-length data, and σ-length alphabet ( σ = 256 ):
- it performs the comparisons from right to left.
- preprocessing phase in __O(m+σ)__ time and space complexity.
- searching phase in __O(m*n)__ time complexity.
- __3*n__ text character comparisons in the worst case when searching for a non periodic pattern.
- __O(n/m)__ best performance.
> See __[Lecroq](http://www-igm.univ-mlv.fr/~lecroq/string/node14.html)__ for reference and also __[Qap](https://github.com/rootslab/qap)__, a QuickSearch parser.
### Install
```bash
$ npm install bop [-g]
```
> __require__:
```javascript
var Bop = require( 'bop' );
```
### Run Tests
```javascript
$cd bop/
$npm test
```
> __to execute a single test file simply do__:
```bash
$ node test/file-name.js
```
### Run Benchmarks
```bash
$ cd bop/
$ npm run bench
```
### Constructor
> Create an instance, using a pattern.
```javascript
Bop( Buffer pattern | String pattern )
// or
new Bop( Buffer pattern | String pattern )
```
### Methods
> Arguments within [] are optional.
```javascript
// Change the pattern to search.
Bop#set( Buffer pattern | String pattern ) : Buffer
/*
* Count matches, optionally starting from a particular index (default
* is 0). It returns an Array containing the number of matches, and the
* remaining bytes
*/
Bop#count( Buffer data [, Number start_from ] ) : Array
// Same as #count, but without counting overlapping sequences
Bop#scount( Buffer data [, Number start_from ] ) : Array
/*
* Count matches, and the max distance found between 2 matches,
* optionally starting from a particular index (default is 0).
* It returns an Array containing:
* - the number of matches
* - the max distance found between 2 matches, -1 otherwise
* - the distance from data index 0 to the first match, -1 otherwise
* - the distance from the end of the last match to the end of data,
* -1 otherwise
*
* NOTE: if 0 or only 1 occurrence was found, the max distance will
* be -1 and then the resulting array will be respectvely:
* - [0, -1, -1, -1 ]
* - [1, -1, .., .. ]
*/
Bop#dist( Buffer data [, Number start_from ] ) : Array
// same as #dist, but without counting overlapping sequences
Bop#sdist( Buffer data [, Number start_from ] ) : Array
/*
* Collect all indexes of pattern occurrences.
*
* As options you can:
*
* - start parsing from a particular index
* - limit the number of results to parse
* - fill your array with resulting indexes.
*
* NOTE: use Buffers when possible (faster).
*
*/
Bop#parse( Buffer data | String data [, Number start_from [, Number limit_results [, Array my_array ] ] ] ) : Array
/*
* Strict parse, it's the same as parse, without collecting
* any overlapping sequences.
*
* Example with CRLF sequence:
*
* - bop pattern is set to: "\r\n\r\n" (CR LF CR LF)
* - data to parse is: "\r\n\r\n\r\n\r\n\r\n\r\n" (CR LF CR LF CR LF CR LF )
*
* - with Bop.parse( data ) we get 3 indexes as results: [0, 2, 4]
*
* 0 1 2 3 4 5 6 7
* -----------------------
* p: CR LF CR LF
* d: CR LF CR LF CR LF CR LF
*
* p: ----> CR LF CR LF
* d: CR LF CR LF CR LF CR LF
*
* p: ----------> CR LF CR LF
* d: CR LF CR LF CR LF CR LF
*
* - with Bop.sparse( data ) we get only 2 results: [0, 4]
*
* 0 1 2 3 4 5 6 7
* -----------------------
* p: CR LF CR LF
* d: CR LF CR LF CR LF CR LF
*
* p: ----------> CR LF CR LF
* d: CR LF CR LF CR LF CR LF
*
*/
Bop#sparse( Buffer data | String data [, Number start_from [, Number limit_results [, Array my_array ] ] ] ) : Array
```
### Usage Example
```javascript
var Bop = require( 'bop' )
, pattern = 'hellofolks\r\n'
, somedata = 'hehehe' + pattern +'eheheh' + pattern
, bop = Bop( pattern )
// parse data from beginning
, results = bop.parse( somedata )
;
```
> See __[examples](example/)__.
#### Benchmark for a short pattern ( length <= 255 bytes )
> Parser uses 3 Buffers 256-bytes long to build shifting tables, then:
> - Pattern parsing / table creation space and time complexity is O(σ).
> - Very low memory footprint.
> - Ultra fast to preprocess pattern ( = tables creation ).
```bash
$ node bench/small-pattern-data-rate
```
for default it:
> - uses a pattern string of 57 bytes/chars.
> - builds a data buffer of 700 MB in memory.
> - uses a redundancy/distance factor for pattern strings equal to 2. The bigger the value,
the lesser are occurrences of pattern string into the text buffer.
**Custom Usage**:
```bash
# with [testBufferSizeInMB] [distanceFactor] [aStringPattern]
$ node bench/small-pattern-data-rate.js 700 4 "that'sallfolks"
```
#### Benchmark for a big pattern ( length > 255 bytes )
> Parser uses 3 arrays to build shifting tables for big patterns, then:
> - there will be an high memory consumption, due to the use of arrays.
> - it will take a long time to preprocess pattern ( = tables creation ).
```bash
$ node bench/big-pattern-data-rate
```
> - it uses a very big pattern ( 20 MBytes ).
> - it builds a data buffer of 300 MBytes, copying the same pattern 12 times.
See __[bench](./bench)__ dir.
### MIT License
> Copyright (c) 2013-present < Guglielmo Ferri : 44gatti@gmail.com >
> Permission is hereby granted, free of charge, to any person obtaining
> a copy of this software and associated documentation files (the
> 'Software'), to deal in the Software without restriction, including
> without limitation the rights to use, copy, modify, merge, publish,
> distribute, sublicense, and/or sell copies of the Software, and to
> permit persons to whom the Software is furnished to do so, subject to
> the following conditions:
> __The above copyright notice and this permission notice shall be
> included in all copies or substantial portions of the Software.__
> THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.