https://github.com/codeo-za/segmenta

An npm module which exposes a blazingly fast API for manipulating and querying arbitrary data segments in Redis
https://github.com/codeo-za/segmenta

Last synced: 14 days ago
JSON representation

An npm module which exposes a blazingly fast API for manipulating and querying arbitrary data segments in Redis

Host: GitHub
URL: https://github.com/codeo-za/segmenta
Owner: codeo-za
Created: 2018-05-04T13:48:39.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2021-03-19T12:50:43.000Z (about 4 years ago)
Last Synced: 2024-10-08T18:10:41.165Z (6 months ago)
Language: TypeScript
Size: 623 KB
Stars: 2
Watchers: 4
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md

Awesome Lists containing this project

awesome-blazingly-fast - segmenta - An npm module which exposes a blazingly fast API for manipulating and querying arbitrary data segments in Redis (TypeScript)

README

        # Segmenta: a segments api using Redis for storage

## What does it do?

Provides a mechanism for storing and retrieving sets of numbers quickly as well

as performing operations with those sets. Currently supported are:

- `and`: produce the set C of numbers which are in both A and B

    - [ 1, 2, 3 ] and [ 2, 3, 4 ] = [ 2, 3 ]

- `or`: produce the set C of numbers which are in either A or B

    - [ 1, 2, 3 ] or  [ 2, 3, 4 ] = [ 1, 2, 3, 4 ]

- `not`: produce the set C of numbers which are in A excluding those in B

    - [ 1, 2, 3 ] not [ 2, 3, 4 ] = [ 1 ]

## How to use?

1. `import` or `require` _segmenta_

    - javascript: `const Segmenta = require("segmenta");`

    - typescript: `import Segmenta from "segmenta";`

    - the only export of the library is the Segmenta class

2. create an instance with options, if required:

    - `const segmenta = new Segmenta()`

    - `const segmenta = new Segmenta(options)`

        - options have the structure:

            ```typescript

            {

              redisOptions?: RedisOptions,

              segmentsPrefix?: string,

              bucketSize?: number,

              resultsTTL?: number,

            }

            ```

            where:

            - `RedisOptions` are the options which would be passed to `ioredis` to initialize (eg `host`, `port`, etc)

            - `segmentsPrefix` is a prefix to apply to all segments keys (defaults to "segments")

            - `bucketSize` is the max size to use when creating buckets (defaults to 50kb)

            - `resultsTTL` is the time you'd like result snapshots to live for when not explicitly released (defaults to 1 day)

3. Populating data

    - `add` adds ids to the segment

        ```javascript

        await segmenta.add("my-segment", [ 1, 2, 3 ]);

        // my-segment now contains [ 1, 2, 3 ]

        ```

    - `del` deletes ids from the segment

        ```javascript

        await segmenta.del("my-segment", [ 2, 3 ]);

        // my-segment now contains just [ 1 ]

        ```

    - `put` takes a sequence of add / del commands and performs them in order (useful for batching streaming data)

        ```javascript

        await segmenta.put("my-segment", [

          { add: 5 },

          { del: 1 },

          { add: 4 },

          { del: 5 },

          { add: 1 }

        ]);

        // my-segment now contains [ 1, 4 ]

        ```

4. Query

    - results are returned as an object with the shape:

        ```typescript

        {

          ids: number[],

          skipped: number,

          count: number,

          resultSetId: string, // used to re-query against snapshot

          total: number

        }

        ```

    - simple queries are supported (entire segments), *however a DSL is provided

        for easier querying*. The client can perform `and`, `or`, and `not` operations

        on results if they are retreived as buffers (see example below in DSL area).

        1. Simple query, all results returned:

        ```javascript

        await segmenta.add("my-new-segment", [ 10, 20, 30 ]);

        const result = await segmenta.query("my-new-segment");

        /* result looks like:

        {

          ids: [ 10, 20, 30 ],

          skipped: 0,

          count: 3,

          resultSetId: "4deee554-da28-4029-8231-98060fa014dc",

          total: 3

        }

        */

        ```

        2. Paged query:

        ```javascript

        await segmenta.add("paged-results-segment", [ 1, 2, 3, 4, 5 ]);

        const result1 = await segmenta.query({

          query: "paged-results-segment",

          skip: 0,

          take: 2

        });

        /* result1 looks like:

        {

          ids: [ 1, 2 ],

          skipped: 0,

          count: 2,

          resultSetId: "63c6a1f0-8aec-4249-9d80-63c5de13b942",

          total: 5

        }

        */

        // the rest of the results can be obtained with:

        const result2 = await segmenta.query({

          query: "63c6a1f0-8aec-4249-9d80-63c5de13b942",

          skip: 2

        });

        ```

        3. Paged results are snapshot and can be re-queried by using their id (uuid). Snapshots automatically expire

            after 24 hours (or the number of seconds specified by `resultsTTL` in your constructor arguments. You may

            manually dispose of results when you no longer need them:

        ```

        const result = await segmenta.query({ query: "my-set", skip: 0, take: 10 });

        await segmenta.dispose(result.resultSetId);

        ```

        _Snapshots are **only** created when queries are performed with a positive integer `skip` or `take` value_

    - **There is a DSL for querying in a more readable manner**:

        ```

        await segmenta.add("set1", [ 1, 2 ]);

        await segmenta.add("set2", [ 3, 4, 5 ]);

        await segmenta.add("set3", [ 2, 3, 5, 6, 7 ]);

        await segmenta.add("set4", [ 5, 6 ]);

        // ... some time later ...

        const query = "get where in 'set1' or 'set2' and 'set3' not 'set4'";

        const result1 = await segmenta.query(query);

        // or, with paging options:

        const result2 = await segmenta.query({ query, skip: 10, take: 100 });

        // the query syntax above is analogous to the following

        //  more manual query mechanism:

        const set1 = await segmenta.getBuffer("set1");

        const set2 = await segmenta.getBuffer("set2");

        const set3 = await segmenta.getBuffer("set3");

        const set4 = await segmenta.getBuffer("set4");

        // these operations are fast, acting on bitfields in memory.

        const final = set1          // [ 1, 2 ]

                        .or(set2)   // [ 1, 2, 3, 4, 5 ]

                        .and(set3)  // [ 2, 3, 5 ]

                        .not(set4)  // [ 2, 3 ]

                        .getOnBitPositions()

                        .values; // returns the numeric array for bit positions

        ```

        One may also query for counts only:

        ```typescript

        await segmenta.query("count where in 'x');

        ```

        One may also query for results to come back in a random order:

        ```typescript

        await segmenta.query("random where in 'x');

        ```

        When paging is requested and a `resultSetId` is returned, requering against

        that result-set will maintain the original randomized order.

        Query syntax is quite simple:

        ```

        (GET | COUNT | RANDOM) WHERE IN('segment-id')

                    [(AND|OR|NOT) IN('other-segment')]...

                    [MIN {int}]

                    [MAX {int}]

                    [SKIP {int}]

                    [TAKE {int}]

        ```

        - segments are identified by strings (single- or double-quoted)

        - only two operations are supported: `GET` and `COUNT`

        - the results of `COUNT` look like `GET` except no segment data is returned. Use

            the `total` field in the result to read your count value.

        - boolean operations are run left-to-right

        - operations may be grouped with brackets, in which case they are evaluated first, eg:

          `GET WHERE IN('x') AND NOT (IN('y') OR IN('z'))`

          - retreives values which are in 'x' and also not in 'y' or 'z';

        - brackets around segment ids are optional:

          `GET WHERE IN 'x'` is equivalent to `GET WHERE IN('x')`

        - the `IN` keyword is optional after the first usage:

          `GET WHERE IN 'x' and IN 'y'` is equivalent to `GET WHERE IN 'x' AND 'y'`

        - syntax is case-insensitive

          `GET WHERE IN 'x'` is equivalent to `get where in 'x'` and `Get Where In('x')`

        - `skip` and `take` can also be set on the query options -- when doing so, the skip/take

            values on query options _take precedence_. This allows easy re-use of natural-language

            query with changing paging values, but also facilitates natural language paging if that

            is your preference.

        - `MIN` and `MAX` set minimum and maximum values to bring back in the result set. These

            values are _inclusive_. This may be useful if `SKIP` doesn't suit your chunking needs,

            but rather setting a `MIN` and a `TAKE`

        - `min` and `max` can also be set on query options. As with `skip` and `take`, the query

            options values for `min` and `max` override any natural language specification

        - **segment ids are case-sensitive**

          - `get where in 'MY-SEGMENT'` is **NOT** equivalent to `get where in 'my-segment'`

        - segment ids may not contain quotations

          - they must be valid redis keys

        - some queries will never make sense, so expect either strange results or parse errors:

          `GET WHERE NOT IN 'x'`

          - since the segments are open-ended, this is essentially an infinite set of

            all numbers, excluding those in segment 'x'

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/codeo-za/segmenta

Awesome Lists containing this project

README