https://github.com/richardstartin/splitmap

Parallel boolean circuit evaluation
https://github.com/richardstartin/splitmap

bitmap bitset boolean-algebra boolean-circuits indexing roaringbitmap

Last synced: 3 months ago
JSON representation

Parallel boolean circuit evaluation

Host: GitHub
URL: https://github.com/richardstartin/splitmap
Owner: richardstartin
License: apache-2.0
Created: 2018-02-04T17:11:03.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-10-28T12:57:46.000Z (over 6 years ago)
Last Synced: 2025-04-15T03:02:14.917Z (3 months ago)
Topics: bitmap, bitset, boolean-algebra, boolean-circuits, indexing, roaringbitmap
Language: Java
Homepage:
Size: 247 KB
Stars: 20
Watchers: 1
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # splitmap

[![Build Status](https://travis-ci.org/richardstartin/splitmap.svg?branch=master)](https://travis-ci.org/richardstartin/splitmap)

[![Coverage Status](https://coveralls.io/repos/github/richardstartin/splitmap/badge.svg?branch=master)](https://coveralls.io/github/richardstartin/splitmap?branch=master)

This library builds on top of [RoaringBitmap](https://github.com/RoaringBitmap/RoaringBitmap) to provide a parallel implementation of boolean circuits (multidimensional filters) and arbitrary aggregations over filters.

For instance, to compute a sum product on a dataset filtered such that only one of two conditions holds:

```java

    PrefixIndex quantities = ...

    PrefixIndex prices = ...

    SplitMap februarySalesIndex = ...

    SplitMap luxuryProductsIndex = ...

    QueryContext context = new QueryContext<>(

    Map.ofEntries(entry("luxuryProducts", luxuryProductsIndex), entry("febSales", februarySalesIndex), 

    Map.ofEntries(entry(PRICE, prices), entry(QTY, quantities)))); 

    double februaryRevenueFromLuxuryProducts = 

            Circuits.evaluateIfKeysIntersect(context, slice -> slice.get("febSales").and(slice.get("luxuryProducts")), "febSales", "luxuryProducts")

            .stream()

            .parallel()

            .mapToDouble(partition -> partition.reduceDouble(SumProduct.reducer(price, quantities)))

            .sum();

```

Which, over millions of quantities and prices, can be computed in under 200 microseconds on a modern processor, where parallel streams may take upwards of 20ms.

It is easy to write arbitrary routines combining filtering, calculation and aggregation. For example statistical calculations evaluated with filter criteria.

```java

  public double productMomentCorrelationCoefficient() {

    // calculate the correlation coefficient between prices observed on different exchanges

    PrefixIndex exchange1Prices = ...

    PrefixIndex exchange2Prices = ...

    SplitMap beforeClose = ...

    SplitMap afterOpen = ...

    QueryContext context = new QueryContext<>(

    Map.ofEntries(entry(BEFORE_CLOSE, beforeClose), entry(AFTER_OPEN, afterOpen), 

    Map.ofEntries(entry(NASDAQ, exchange1Prices), entry(LSE, exchange2Prices)))); 

    // evaluate product moment correlation coefficient 

    return Circuits.evaluate(context, slice -> slice.get(BEFORE_CLOSE).or(slice.get(AFTER_OPEN)), 

            BEFORE_CLOSE, AFTER_OPEN) 

            .stream()

            .parallel()

            .map(partition -> partition.reduce(SimpleLinearRegression.reducer(exchange1Prices, exchange2Prices)))

            .collect(SimpleLinearRegression.pmcc());

  }

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/richardstartin/splitmap

Awesome Lists containing this project

README