https://github.com/terma/fast-select

Extremely fast and compact in-memory embedded column oriented database
https://github.com/terma/fast-select

cache-storage compact java lock-free nosql-database storage

Last synced: 12 months ago
JSON representation

Extremely fast and compact in-memory embedded column oriented database

Host: GitHub
URL: https://github.com/terma/fast-select
Owner: terma
License: apache-2.0
Created: 2015-11-04T02:19:41.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2017-08-29T23:43:41.000Z (almost 9 years ago)
Last Synced: 2024-11-17T13:03:43.637Z (over 1 year ago)
Topics: cache-storage, compact, java, lock-free, nosql-database, storage
Language: Java
Homepage: https://terma.github.io/fast-select/
Size: 819 KB
Stars: 19
Watchers: 2
Forks: 12
Open Issues: 11
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          # fast-select

[![Build Status](https://travis-ci.org/terma/fast-select.svg?branch=start)](https://travis-ci.org/terma/fast-select)

[![Coverage Status](https://coveralls.io/repos/github/terma/fast-select/badge.svg?branch=master)](https://coveralls.io/github/terma/fast-select?branch=master) [![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.github.terma/fast-select/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.github.terma/fast-select/)

Compact in-memory read-only storage with lock free ultra-fast quering by any attributes under Apache 2.0 license.

* [Key Properties](#key-properties)

* [Use Cases](#use-cases)

* [Architecture](docs/ARHI.md), [Performance](docs/PERF.md), [Javadoc](http://terma.github.io/fast-select/)

* [How To Use](#how-to-use)

  * [Aggregate/Group By/Pivot](#aggregate)

  * [Filter, Sort and first 25](#select-first-25-items-from-sorted-dataset)

  * [Filter, Sort and get page](#filter-dataset-get-total-and-render-only-one-page)

  * [JMX](#jmx)

  * [Low Cardinality Strings](#low-cardinality-strings)

## Key Properties

* Compact 

  * No java object overhead (in avg ```x10``` less than object representation)

  * Compact string representation (```UTF-8``` instead of Java ```UTF-16```)

  * String data compression

  * Small metadata footprint (no indexes overhead)

* Fast

  * All dimension available for search

  * Using data statistic to avoid full scan

  * Column oriented

  * Thread safe and lock free

* Support fast save/load to/from disk [details](USECASES.md)

* Small jar file

* Apache 2.0

## Use Cases

* Speed up analytical quering by caching main data in compact and query optimized way instead of using expensive solution [details](USECASES.md#speed-up-analytic)

* Separate ETL and analytic load by keeping main data optimized for processing and add compact model optimizing for quering [details](USECASES.md#separate-processing-and-analytic)

* Sub second quering of historical data by loading portion of data on demand in a seconds [details](USECASES.md#speed-up-history-analytic)

## How to use

### Create Data Class

```java

public class Data {

    public byte a;

    public byte b;

}

```

### Build storage

```java

FastSelect database = new FastSelectBuilder<>(Data.class).create();

// add your data

database.addAll(new ArrayList(...)); 

```

### Aggregate

In case if you just want ```select F1, F2, count(X) ... group by F1, F2```

```java

MultiGroupCountCallback callback = new MultiGroupCountCallback(fastSelect.getColumnsByNames().get("a"));

database.select(

  new Request[] {new IntRequest("a", new int[]{12, 3})}, 

  callback);

callback.getCounters(); // your result here grouped by field 'a'

```

Starting from version ```3.2.0``` For more sophisticated and flexible cases you can use ```AggregateCallback``` which support user defined aggregation in fast way, so you don't need to worry about aggregation key perfromance etc.

```java

FastSelect fastSelect = ...;

final ByteData data = fastSelect.getData("columnWithData");

AggregateCallback callback = new AggregateCallback<>(

    new Aggregator() {

        @Override

        public MutableInt create(int position) {

           // will be called when this unique key happens first time

           return new MutableInt(data.data[position]);

        }

        

        @Override

        public void aggregate(MutableInt agg, int position) {

            // will be called all other times

            agg.add(data.data[position]);

        }

    },

    fastSelect.getColumnsByNames().get("aggregationColumn1"),

    fastSelect.getColumnsByNames().get("aggregationColumn2")

    // you can specify any amount of columns with any type

);

fastSelect.select(new Request[0], callback);

Map result = callback.getResult();

```

### Select first 25 items from sorted dataset

```java

ListLimitCallback callback = new ListLimitCallback<>(25);

fastSelect.selectAndSort(where, callback, "a");

callback.getResult();

```

### Filter dataset get total and render only one page

```java

// get ref to real data

IntData id = (IntData) fastSelect.getColumnsByNames().get("id").data;

List positions = fastSelect.selectPositions(new Request[] {...});

Collections.sort(positions, new Comparator() {

    public int compare(Integer p1, Integer p2) { 

        return id.data[p1] - id.data[p2];

    }

});

// page render

List> page = new ArrayList<>();

for (int i = 10; i < 20; i++) {

    int p = positions.get(i);

    Map row = new HashMap<>();

    row.put("id", id.data[p]);

    page.add(row);

}

int total = positions.size();

```

### Combine filters by AND

Just add more requests

```java

fastSelect.select(new Request[] {

    new IntRequest("id", 12),

    new StringLikeRequest("name", "bim"); // name like '%bim%'

    ...

});

```

### Combine filters by OR

Wrap requests which should be by OR to ```OrRequest```

```java

new OrRequest(

    new IntRequest("id", 12),

    new StringLikeRequest("name", "bim"); // name like '%bim%'

)

```

### JMX

To publish information by JMX about instance of FastSelect you can use embedded class ```FastSelectMXBeanImpl``` from package ```com.github.terma.fastselect.jmx``` It provide read-only info like:

* size (count of records)

* allocated size

* used mem

* columns (type, name, mem)

#### To register FastSelect instance by JMX

```java

FastSelect fastSelect = ...;

FastSelectMXBean fastSelectMXBean = new FastSelectMXBeanImpl(fastSelect);

MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();

mbs.registerMBean(fastSelectMXBean, new ObjectName("fastselect:type=mbeanname"));

```

#### Unregister 

Use standard way for MBeans:

```java

String mbeanName = ...;

MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();

mbs.unregisterMBean(new ObjectName("fastselect:type=mbeanname"));

```

More use cases you can find in javadoc ```callbacks``` package

### Low Cardinality Strings

```fast-select``` provides very compact storage for ```small``` Java types and String, which is provide you

 mem benefits because of no Java object overhead and ```UTF-8``` compression. However you can

 get even better result for low cardinality columns. Take a look on:

 

 * ```com.github.terma.fastselect.data.StringCompressedByteData```

 * ```com.github.terma.fastselect.data.StringCompressedShortData```

 * ```com.github.terma.fastselect.data.StringCompressedIntData```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/terma/fast-select

Awesome Lists containing this project

README