https://github.com/richardstartin/splitmap
Parallel boolean circuit evaluation
https://github.com/richardstartin/splitmap
bitmap bitset boolean-algebra boolean-circuits indexing roaringbitmap
Last synced: 3 months ago
JSON representation
Parallel boolean circuit evaluation
- Host: GitHub
- URL: https://github.com/richardstartin/splitmap
- Owner: richardstartin
- License: apache-2.0
- Created: 2018-02-04T17:11:03.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-10-28T12:57:46.000Z (over 6 years ago)
- Last Synced: 2025-04-15T03:02:14.917Z (3 months ago)
- Topics: bitmap, bitset, boolean-algebra, boolean-circuits, indexing, roaringbitmap
- Language: Java
- Homepage:
- Size: 247 KB
- Stars: 20
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# splitmap
[](https://travis-ci.org/richardstartin/splitmap)
[](https://coveralls.io/github/richardstartin/splitmap?branch=master)This library builds on top of [RoaringBitmap](https://github.com/RoaringBitmap/RoaringBitmap) to provide a parallel implementation of boolean circuits (multidimensional filters) and arbitrary aggregations over filters.
For instance, to compute a sum product on a dataset filtered such that only one of two conditions holds:
```java
PrefixIndex quantities = ...
PrefixIndex prices = ...
SplitMap februarySalesIndex = ...
SplitMap luxuryProductsIndex = ...
QueryContext context = new QueryContext<>(
Map.ofEntries(entry("luxuryProducts", luxuryProductsIndex), entry("febSales", februarySalesIndex),
Map.ofEntries(entry(PRICE, prices), entry(QTY, quantities))));double februaryRevenueFromLuxuryProducts =
Circuits.evaluateIfKeysIntersect(context, slice -> slice.get("febSales").and(slice.get("luxuryProducts")), "febSales", "luxuryProducts")
.stream()
.parallel()
.mapToDouble(partition -> partition.reduceDouble(SumProduct.reducer(price, quantities)))
.sum();
```Which, over millions of quantities and prices, can be computed in under 200 microseconds on a modern processor, where parallel streams may take upwards of 20ms.
It is easy to write arbitrary routines combining filtering, calculation and aggregation. For example statistical calculations evaluated with filter criteria.
```java
public double productMomentCorrelationCoefficient() {
// calculate the correlation coefficient between prices observed on different exchanges
PrefixIndex exchange1Prices = ...
PrefixIndex exchange2Prices = ...
SplitMap beforeClose = ...
SplitMap afterOpen = ...
QueryContext context = new QueryContext<>(
Map.ofEntries(entry(BEFORE_CLOSE, beforeClose), entry(AFTER_OPEN, afterOpen),
Map.ofEntries(entry(NASDAQ, exchange1Prices), entry(LSE, exchange2Prices))));
// evaluate product moment correlation coefficient
return Circuits.evaluate(context, slice -> slice.get(BEFORE_CLOSE).or(slice.get(AFTER_OPEN)),
BEFORE_CLOSE, AFTER_OPEN)
.stream()
.parallel()
.map(partition -> partition.reduce(SimpleLinearRegression.reducer(exchange1Prices, exchange2Prices)))
.collect(SimpleLinearRegression.pmcc());
}
```