Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/codeo-za/segmenta

An npm module which exposes a blazingly fast API for manipulating and querying arbitrary data segments in Redis
https://github.com/codeo-za/segmenta

Last synced: about 2 months ago
JSON representation

An npm module which exposes a blazingly fast API for manipulating and querying arbitrary data segments in Redis

Awesome Lists containing this project

README

        

# Segmenta: a segments api using Redis for storage

## What does it do?

Provides a mechanism for storing and retrieving sets of numbers quickly as well
as performing operations with those sets. Currently supported are:
- `and`: produce the set C of numbers which are in both A and B
- [ 1, 2, 3 ] and [ 2, 3, 4 ] = [ 2, 3 ]
- `or`: produce the set C of numbers which are in either A or B
- [ 1, 2, 3 ] or [ 2, 3, 4 ] = [ 1, 2, 3, 4 ]
- `not`: produce the set C of numbers which are in A excluding those in B
- [ 1, 2, 3 ] not [ 2, 3, 4 ] = [ 1 ]

## How to use?

1. `import` or `require` _segmenta_
- javascript: `const Segmenta = require("segmenta");`
- typescript: `import Segmenta from "segmenta";`
- the only export of the library is the Segmenta class
2. create an instance with options, if required:
- `const segmenta = new Segmenta()`
- `const segmenta = new Segmenta(options)`
- options have the structure:
```typescript
{
redisOptions?: RedisOptions,
segmentsPrefix?: string,
bucketSize?: number,
resultsTTL?: number,
}
```
where:
- `RedisOptions` are the options which would be passed to `ioredis` to initialize (eg `host`, `port`, etc)
- `segmentsPrefix` is a prefix to apply to all segments keys (defaults to "segments")
- `bucketSize` is the max size to use when creating buckets (defaults to 50kb)
- `resultsTTL` is the time you'd like result snapshots to live for when not explicitly released (defaults to 1 day)
3. Populating data
- `add` adds ids to the segment
```javascript
await segmenta.add("my-segment", [ 1, 2, 3 ]);
// my-segment now contains [ 1, 2, 3 ]
```
- `del` deletes ids from the segment
```javascript
await segmenta.del("my-segment", [ 2, 3 ]);
// my-segment now contains just [ 1 ]
```
- `put` takes a sequence of add / del commands and performs them in order (useful for batching streaming data)
```javascript
await segmenta.put("my-segment", [
{ add: 5 },
{ del: 1 },
{ add: 4 },
{ del: 5 },
{ add: 1 }
]);
// my-segment now contains [ 1, 4 ]
```
4. Query
- results are returned as an object with the shape:
```typescript
{
ids: number[],
skipped: number,
count: number,
resultSetId: string, // used to re-query against snapshot
total: number
}
```
- simple queries are supported (entire segments), *however a DSL is provided
for easier querying*. The client can perform `and`, `or`, and `not` operations
on results if they are retreived as buffers (see example below in DSL area).
1. Simple query, all results returned:
```javascript
await segmenta.add("my-new-segment", [ 10, 20, 30 ]);
const result = await segmenta.query("my-new-segment");
/* result looks like:
{
ids: [ 10, 20, 30 ],
skipped: 0,
count: 3,
resultSetId: "4deee554-da28-4029-8231-98060fa014dc",
total: 3
}
*/
```
2. Paged query:
```javascript
await segmenta.add("paged-results-segment", [ 1, 2, 3, 4, 5 ]);
const result1 = await segmenta.query({
query: "paged-results-segment",
skip: 0,
take: 2
});
/* result1 looks like:
{
ids: [ 1, 2 ],
skipped: 0,
count: 2,
resultSetId: "63c6a1f0-8aec-4249-9d80-63c5de13b942",
total: 5
}
*/
// the rest of the results can be obtained with:
const result2 = await segmenta.query({
query: "63c6a1f0-8aec-4249-9d80-63c5de13b942",
skip: 2
});
```
3. Paged results are snapshot and can be re-queried by using their id (uuid). Snapshots automatically expire
after 24 hours (or the number of seconds specified by `resultsTTL` in your constructor arguments. You may
manually dispose of results when you no longer need them:
```
const result = await segmenta.query({ query: "my-set", skip: 0, take: 10 });
await segmenta.dispose(result.resultSetId);
```
_Snapshots are **only** created when queries are performed with a positive integer `skip` or `take` value_
- **There is a DSL for querying in a more readable manner**:
```
await segmenta.add("set1", [ 1, 2 ]);
await segmenta.add("set2", [ 3, 4, 5 ]);
await segmenta.add("set3", [ 2, 3, 5, 6, 7 ]);
await segmenta.add("set4", [ 5, 6 ]);
// ... some time later ...
const query = "get where in 'set1' or 'set2' and 'set3' not 'set4'";
const result1 = await segmenta.query(query);
// or, with paging options:
const result2 = await segmenta.query({ query, skip: 10, take: 100 });

// the query syntax above is analogous to the following
// more manual query mechanism:
const set1 = await segmenta.getBuffer("set1");
const set2 = await segmenta.getBuffer("set2");
const set3 = await segmenta.getBuffer("set3");
const set4 = await segmenta.getBuffer("set4");
// these operations are fast, acting on bitfields in memory.
const final = set1 // [ 1, 2 ]
.or(set2) // [ 1, 2, 3, 4, 5 ]
.and(set3) // [ 2, 3, 5 ]
.not(set4) // [ 2, 3 ]
.getOnBitPositions()
.values; // returns the numeric array for bit positions
```
One may also query for counts only:
```typescript
await segmenta.query("count where in 'x');
```
One may also query for results to come back in a random order:
```typescript
await segmenta.query("random where in 'x');
```
When paging is requested and a `resultSetId` is returned, requering against
that result-set will maintain the original randomized order.

Query syntax is quite simple:
```
(GET | COUNT | RANDOM) WHERE IN('segment-id')
[(AND|OR|NOT) IN('other-segment')]...
[MIN {int}]
[MAX {int}]
[SKIP {int}]
[TAKE {int}]
```
- segments are identified by strings (single- or double-quoted)
- only two operations are supported: `GET` and `COUNT`
- the results of `COUNT` look like `GET` except no segment data is returned. Use
the `total` field in the result to read your count value.
- boolean operations are run left-to-right
- operations may be grouped with brackets, in which case they are evaluated first, eg:
`GET WHERE IN('x') AND NOT (IN('y') OR IN('z'))`
- retreives values which are in 'x' and also not in 'y' or 'z';
- brackets around segment ids are optional:
`GET WHERE IN 'x'` is equivalent to `GET WHERE IN('x')`
- the `IN` keyword is optional after the first usage:
`GET WHERE IN 'x' and IN 'y'` is equivalent to `GET WHERE IN 'x' AND 'y'`
- syntax is case-insensitive
`GET WHERE IN 'x'` is equivalent to `get where in 'x'` and `Get Where In('x')`
- `skip` and `take` can also be set on the query options -- when doing so, the skip/take
values on query options _take precedence_. This allows easy re-use of natural-language
query with changing paging values, but also facilitates natural language paging if that
is your preference.
- `MIN` and `MAX` set minimum and maximum values to bring back in the result set. These
values are _inclusive_. This may be useful if `SKIP` doesn't suit your chunking needs,
but rather setting a `MIN` and a `TAKE`
- `min` and `max` can also be set on query options. As with `skip` and `take`, the query
options values for `min` and `max` override any natural language specification
- **segment ids are case-sensitive**
- `get where in 'MY-SEGMENT'` is **NOT** equivalent to `get where in 'my-segment'`
- segment ids may not contain quotations
- they must be valid redis keys
- some queries will never make sense, so expect either strange results or parse errors:
`GET WHERE NOT IN 'x'`
- since the segments are open-ended, this is essentially an infinite set of
all numbers, excluding those in segment 'x'