https://github.com/terma/fast-select
Extremely fast and compact in-memory embedded column oriented database
https://github.com/terma/fast-select
cache-storage compact java lock-free nosql-database storage
Last synced: 11 months ago
JSON representation
Extremely fast and compact in-memory embedded column oriented database
- Host: GitHub
- URL: https://github.com/terma/fast-select
- Owner: terma
- License: apache-2.0
- Created: 2015-11-04T02:19:41.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2017-08-29T23:43:41.000Z (almost 9 years ago)
- Last Synced: 2024-11-17T13:03:43.637Z (over 1 year ago)
- Topics: cache-storage, compact, java, lock-free, nosql-database, storage
- Language: Java
- Homepage: https://terma.github.io/fast-select/
- Size: 819 KB
- Stars: 19
- Watchers: 2
- Forks: 12
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# fast-select
[](https://travis-ci.org/terma/fast-select)
[](https://coveralls.io/github/terma/fast-select?branch=master) [](https://maven-badges.herokuapp.com/maven-central/com.github.terma/fast-select/)
Compact in-memory read-only storage with lock free ultra-fast quering by any attributes under Apache 2.0 license.
* [Key Properties](#key-properties)
* [Use Cases](#use-cases)
* [Architecture](docs/ARHI.md), [Performance](docs/PERF.md), [Javadoc](http://terma.github.io/fast-select/)
* [How To Use](#how-to-use)
* [Aggregate/Group By/Pivot](#aggregate)
* [Filter, Sort and first 25](#select-first-25-items-from-sorted-dataset)
* [Filter, Sort and get page](#filter-dataset-get-total-and-render-only-one-page)
* [JMX](#jmx)
* [Low Cardinality Strings](#low-cardinality-strings)
## Key Properties
* Compact
* No java object overhead (in avg ```x10``` less than object representation)
* Compact string representation (```UTF-8``` instead of Java ```UTF-16```)
* String data compression
* Small metadata footprint (no indexes overhead)
* Fast
* All dimension available for search
* Using data statistic to avoid full scan
* Column oriented
* Thread safe and lock free
* Support fast save/load to/from disk [details](USECASES.md)
* Small jar file
* Apache 2.0
## Use Cases
* Speed up analytical quering by caching main data in compact and query optimized way instead of using expensive solution [details](USECASES.md#speed-up-analytic)
* Separate ETL and analytic load by keeping main data optimized for processing and add compact model optimizing for quering [details](USECASES.md#separate-processing-and-analytic)
* Sub second quering of historical data by loading portion of data on demand in a seconds [details](USECASES.md#speed-up-history-analytic)
## How to use
### Create Data Class
```java
public class Data {
public byte a;
public byte b;
}
```
### Build storage
```java
FastSelect database = new FastSelectBuilder<>(Data.class).create();
// add your data
database.addAll(new ArrayList(...));
```
### Aggregate
In case if you just want ```select F1, F2, count(X) ... group by F1, F2```
```java
MultiGroupCountCallback callback = new MultiGroupCountCallback(fastSelect.getColumnsByNames().get("a"));
database.select(
new Request[] {new IntRequest("a", new int[]{12, 3})},
callback);
callback.getCounters(); // your result here grouped by field 'a'
```
Starting from version ```3.2.0``` For more sophisticated and flexible cases you can use ```AggregateCallback``` which support user defined aggregation in fast way, so you don't need to worry about aggregation key perfromance etc.
```java
FastSelect fastSelect = ...;
final ByteData data = fastSelect.getData("columnWithData");
AggregateCallback callback = new AggregateCallback<>(
new Aggregator() {
@Override
public MutableInt create(int position) {
// will be called when this unique key happens first time
return new MutableInt(data.data[position]);
}
@Override
public void aggregate(MutableInt agg, int position) {
// will be called all other times
agg.add(data.data[position]);
}
},
fastSelect.getColumnsByNames().get("aggregationColumn1"),
fastSelect.getColumnsByNames().get("aggregationColumn2")
// you can specify any amount of columns with any type
);
fastSelect.select(new Request[0], callback);
Map result = callback.getResult();
```
### Select first 25 items from sorted dataset
```java
ListLimitCallback callback = new ListLimitCallback<>(25);
fastSelect.selectAndSort(where, callback, "a");
callback.getResult();
```
### Filter dataset get total and render only one page
```java
// get ref to real data
IntData id = (IntData) fastSelect.getColumnsByNames().get("id").data;
List positions = fastSelect.selectPositions(new Request[] {...});
Collections.sort(positions, new Comparator() {
public int compare(Integer p1, Integer p2) {
return id.data[p1] - id.data[p2];
}
});
// page render
List> page = new ArrayList<>();
for (int i = 10; i < 20; i++) {
int p = positions.get(i);
Map row = new HashMap<>();
row.put("id", id.data[p]);
page.add(row);
}
int total = positions.size();
```
### Combine filters by AND
Just add more requests
```java
fastSelect.select(new Request[] {
new IntRequest("id", 12),
new StringLikeRequest("name", "bim"); // name like '%bim%'
...
});
```
### Combine filters by OR
Wrap requests which should be by OR to ```OrRequest```
```java
new OrRequest(
new IntRequest("id", 12),
new StringLikeRequest("name", "bim"); // name like '%bim%'
)
```
### JMX
To publish information by JMX about instance of FastSelect you can use embedded class ```FastSelectMXBeanImpl``` from package ```com.github.terma.fastselect.jmx``` It provide read-only info like:
* size (count of records)
* allocated size
* used mem
* columns (type, name, mem)
#### To register FastSelect instance by JMX
```java
FastSelect fastSelect = ...;
FastSelectMXBean fastSelectMXBean = new FastSelectMXBeanImpl(fastSelect);
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
mbs.registerMBean(fastSelectMXBean, new ObjectName("fastselect:type=mbeanname"));
```
#### Unregister
Use standard way for MBeans:
```java
String mbeanName = ...;
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
mbs.unregisterMBean(new ObjectName("fastselect:type=mbeanname"));
```
More use cases you can find in javadoc ```callbacks``` package
### Low Cardinality Strings
```fast-select``` provides very compact storage for ```small``` Java types and String, which is provide you
mem benefits because of no Java object overhead and ```UTF-8``` compression. However you can
get even better result for low cardinality columns. Take a look on:
* ```com.github.terma.fastselect.data.StringCompressedByteData```
* ```com.github.terma.fastselect.data.StringCompressedShortData```
* ```com.github.terma.fastselect.data.StringCompressedIntData```