https://github.com/compprov/compprov-core
Core module of the compprov framework - wraps operations into a tracked Calculation Provenance Graph (CPG) context
https://github.com/compprov/compprov-core
audit computational-provenance dag framework graph java open-source parallel-execution provenance
Last synced: about 13 hours ago
JSON representation
Core module of the compprov framework - wraps operations into a tracked Calculation Provenance Graph (CPG) context
- Host: GitHub
- URL: https://github.com/compprov/compprov-core
- Owner: compprov
- License: other
- Created: 2026-03-01T02:06:52.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-03-24T07:11:02.000Z (10 days ago)
- Last Synced: 2026-03-25T07:04:20.195Z (9 days ago)
- Topics: audit, computational-provenance, dag, framework, graph, java, open-source, parallel-execution, provenance
- Language: Java
- Homepage: https://github.com/compprov
- Size: 106 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# compprov-core
**compprov** (Computational Provenance) is a Java framework that automatically builds a
**Calculation Provenance Graph (CPG)** — a DAG that records every variable and every operation
in a computation as it runs. The result is a complete, machine-readable audit trail of how each
output was derived from its inputs.
---
## Contents
- [Core concepts](#core-concepts)
- [Getting started](#getting-started)
- [Usage example](#usage-example)
- [Snapshot: export, replay, and diff](#snapshot-export-replay-and-diff)
- [Extending with custom type wrappers](#extending-with-custom-type-wrappers)
- [Built-in types](#built-in-types)
- [Thread safety](#thread-safety)
- [Visualization](#visualization)
- [Examples](#examples)
- [License](#license)
---
## Core concepts
| Concept | Description |
|---|---|
| **CPG** | Directed Acyclic Graph where nodes are variables and edges are data-flow dependencies. Produced automatically during execution. |
| **Snapshot** | Immutable, point-in-time capture of all variables and operations recorded in a context. Can be serialized to JSON, replayed, or compared. |
| **Descriptor** | Name + optional metadata (`Meta`) attached to a variable or operation. Used in logs, diff reports, and audit trails. |
| **VariableWrapper** | Factory that converts a plain value into a provenance-tracked `WrappedVariable` and registers it in the active context. |
| **ComputationEnvironment** | Shared, thread-safe configuration: registered wrappers, clock, Jackson mapper, descriptor enforcement rules. |
| **ComputationContext** | Per-computation scope that accumulates the CPG. Not safe to snapshot while mutating. |
---
## Getting started
Add the dependency to your `pom.xml` (latest version: [](https://central.sonatype.com/artifact/io.compprov/compprov-core)):
```xml
io.compprov
compprov-core
VERSION
```
Requires Java 17+.
---
## Usage example
The entry point is `DefaultComputationEnvironment` (preconfigured with all built-in wrappers
and Jackson serializers) and `DefaultComputationContext` (typed convenience wrappers on top
of the base context).
```java
import io.compprov.core.*;
import io.compprov.core.meta.Descriptor;
import java.math.BigDecimal;
import java.math.MathContext;
import java.math.RoundingMode;
// --- 1. Create the environment (thread-safe; reuse across computations) ---
var env = new DefaultComputationEnvironment();
// --- 2. Create a context for this computation run ---
var ctx = new DefaultComputationContext(
env,
new DataContext(Descriptor.descriptor("invoice-calculation")));
// --- 3. Wrap all inputs ---
// Every wrapped value gets a unique ID and is recorded in the CPG as an INPUT node.
var mc = ctx.wrapMathContext(new MathContext(10, RoundingMode.HALF_UP),
Descriptor.descriptor("mc"));
var price = ctx.wrapBigDecimal(new BigDecimal("100.00"),
Descriptor.descriptor("price"));
var taxRate = ctx.wrapBigDecimal(new BigDecimal("0.08"),
Descriptor.descriptor("tax-rate"));
// --- 4. Perform operations ---
// Each call records an operation node in the CPG and returns a wrapped result.
// Pass null as the last argument to let the framework auto-name the result.
var tax = price.multiply(taxRate, mc, Descriptor.descriptor("tax"));
var total = price.add(tax, mc, Descriptor.descriptor("total"));
// --- 5. Read the result like any other value ---
System.out.println(total.getValue()); // 108.0000000
// --- 6. Export the full Calculation Provenance Graph ---
Snapshot snapshot = ctx.snapshot();
System.out.println(env.toJson(snapshot));
System.out.println(env.toHumanReadableLog(snapshot));
```
### JSON output (abbreviated)
```json
{
"descriptor" : { "name" : "invoice-calculation", "meta" : { } },
"variables" : [
{ "track" : { "id" : "i_1", "descriptor" : { "name" : "mc" }, ... }, "value" : ... },
{ "track" : { "id" : "i_2", "descriptor" : { "name" : "price" }, ... }, "value" : "100.00" },
{ "track" : { "id" : "i_3", "descriptor" : { "name" : "tax-rate" }, ... }, "value" : "0.08" },
{ "track" : { "id" : "o_4", "descriptor" : { "name" : "tax" }, ... }, "value" : "8.000000000" },
{ "track" : { "id" : "o_5", "descriptor" : { "name" : "total" }, ... }, "value" : "108.0000000" }
],
"operations" : [
{ "track" : { "id" : "o_1", "descriptor" : { "name" : "multiply" }, ... },
"arguments" : { "a" : "i_2", "b" : "i_3", "mc" : "i_1" },
"resultId" : "o_4" },
{ "track" : { "id" : "o_2", "descriptor" : { "name" : "add" }, ... },
"arguments" : { "a" : "i_2", "b" : "o_4", "mc" : "i_1" },
"resultId" : "o_5" }
]
}
```
Variable IDs use the prefix `i_` for inputs and `o_` for outputs, followed by a sequential
numeric counter that is stable within a single context run.
---
## Snapshot: export, replay, and diff
### Serialize and deserialize
```java
String json = env.toJson(ctx.snapshot());
// Deserialize back to a Snapshot
Snapshot restored = env.fromJson(json);
```
### Replay a computation
`env.compute()` replays all recorded operations against the given snapshot,
producing a new context with freshly computed outputs:
```java
var replayed = env.compute(restored);
BigDecimal replayedTotal = (BigDecimal) replayed.getVariable("o_5").getValue();
```
### Change an input and propagate
Use `copyWith` to substitute one or more input values, then replay:
```java
Snapshot modified = env.copyWith(
restored,
Descriptor.descriptor("invoice-calculation-v2"),
Map.of("i_3", new ValueWithDescriptor(
Descriptor.descriptor("tax-rate"),
new BigDecimal("0.10")))); // 10% tax instead of 8%
var updated = env.compute(modified);
// updated.getVariable("o_5") now reflects the new total
```
---
## Extending with custom type wrappers
Adding support for a type not built into the framework requires three things:
1. A `Wrapped` class that defines the tracked operations for your type.
2. A `VariableWrapper` factory that instantiates it.
3. Registering the factory with the environment.
### Step 1 — Create the wrapped class
```java
import io.compprov.core.ComputationContext;
import io.compprov.core.meta.Descriptor;
import io.compprov.core.variable.AbstractWrappedVariable;
import io.compprov.core.variable.VariableTrack;
import java.util.*;
import java.util.function.Function;
public final class WrappedLong extends AbstractWrappedVariable {
// Define one Descriptor constant per operation.
private static final Descriptor OP_ADD = Descriptor.descriptor("add");
private static final Descriptor OP_MULTIPLY = Descriptor.descriptor("multiply");
// Map each Descriptor to a lambda that performs the actual computation.
private static final Map, Object>> FUNCTIONS;
static {
Map, Object>> m = new HashMap<>();
m.put(OP_ADD, args -> (Long) args.get(0) + (Long) args.get(1));
m.put(OP_MULTIPLY, args -> (Long) args.get(0) * (Long) args.get(1));
FUNCTIONS = Collections.unmodifiableMap(m);
}
public WrappedLong(ComputationContext context, VariableTrack track, Long value) {
super(context, track, value);
}
@Override
public Function, Object> getFunction(Descriptor operationDescriptor) {
return FUNCTIONS.get(operationDescriptor);
}
// --- Public API ---
// Each operation comes in two overloads: with and without a result Descriptor.
public WrappedLong add(WrappedLong augend, Descriptor resultDescriptor) {
return (WrappedLong) execute(OP_ADD, "a", this, "b", augend, resultDescriptor);
}
public WrappedLong add(WrappedLong augend) {
return add(augend, null);
}
public WrappedLong multiply(WrappedLong multiplicand, Descriptor resultDescriptor) {
return (WrappedLong) execute(OP_MULTIPLY, "a", this, "b", multiplicand, resultDescriptor);
}
public WrappedLong multiply(WrappedLong multiplicand) {
return multiply(multiplicand, null);
}
}
```
### Step 2 — Create the factory
```java
import io.compprov.core.ComputationContext;
import io.compprov.core.variable.VariableTrack;
import io.compprov.core.variable.VariableWrapper;
import io.compprov.core.variable.WrappedVariable;
public final class LongWrapperFactory implements VariableWrapper {
@Override
public WrappedVariable wrap(ComputationContext context, VariableTrack track, Long value) {
return new WrappedLong(context, track, value);
}
}
```
### Step 3 — Register and use
```java
var env = new DefaultComputationEnvironment();
env.registerWrapper(Long.class, new LongWrapperFactory());
var ctx = new DefaultComputationContext(env,
new DataContext(Descriptor.descriptor("my-computation")));
// Use the base wrap() method — DefaultComputationContext does not have a wrapLong() helper.
// Cast to your concrete type after wrapping.
var a = (WrappedLong) ctx.wrap(100L, Descriptor.descriptor("a"));
var b = (WrappedLong) ctx.wrap(42L, Descriptor.descriptor("b"));
var sum = a.add(b, Descriptor.descriptor("sum"));
```
If you use a custom type frequently, extend `DefaultComputationContext` to add a typed
`wrapLong()` convenience method, the same way `DefaultComputationContext` does for
`wrapBigDecimal`, `wrapBigInteger`, etc.
### Type deserialization
If you need to serialize and deserialize snapshots containing your custom type, register a
Jackson deserializer with the `ObjectMapper` inside your custom `ComputationEnvironment`.
See `DefaultComputationEnvironment` for examples using `ZonedDateTimeSerializer`,
`MathContextDeserializer`, and `VariableDeserializer`.
---
## Built-in types
`DefaultComputationEnvironment` registers the following wrappers out of the box:
| Java type | Wrapped class | Notes |
|---|---|---|
| `BigDecimal` | `WrappedBigDecimal` | Full arithmetic: add, subtract, multiply, divide, pow, sqrt, abs, negate, remainder, max, min, and more |
| `BigInteger` | `WrappedBigInteger` | Full arithmetic including modPow (ternary) |
| `Integer` | `WrappedInteger` | Parameter-only type; used as an argument to `pow`, `scaleByPowerOfTen`, etc. |
| `Long` | `WrappedLong` | Parameter-only type |
| `MathContext` | `WrappedMathContext` | Carries precision and rounding mode; passed to most `BigDecimal` / `BigInteger` ops |
---
## Thread safety
`ComputationEnvironment` and its wrappers map are fully thread-safe — a single instance can
be shared across threads and computations.
`ComputationContext` is thread-safe for all wrap and executeOperation calls. The `snapshot()`
method is **not** safe to call while other threads are still recording operations into the same
context.
---
## Visualization
To visualize your CPG data use [compprov-render](https://github.com/compprov/compprov-render) —
a set of HTML pages that run locally in your web browser, no server required.
Simply export a snapshot to JSON and open the page:
```java
String json = env.toJson(ctx.snapshot());
// save to a file, then open graph.html or plot.html in your browser
```
### Graph view — provenance graph
Renders the full CPG as an interactive node-edge graph.
Variables are shown as typed nodes (input / output), operations as diamond nodes with labeled argument edges.

### Plot view — multi-dataset comparison
Plots numeric variable values across one or more datasets side-by-side.
Supports points, line, and table views with configurable X-axis labels.

---
## Examples
The `io.compprov.examples` package contains three self-contained examples that each demonstrate
a different aspect of the framework.
| Example | Package | Domain | Key technique |
|---|---|---|---|
| [Net Asset Value (NAV)](#net-asset-value-nav) | `io.compprov.examples.nav` | Crypto-portfolio accounting | Custom domain type wrappers |
| [Gauge Block Calibration](#gauge-block-calibration) | `io.compprov.examples.gaugeblock` | Precision length metrology | Pure `BigDecimal` scalar formula chain |
| [Hydrological Model Evaluation](#hydrological-model-evaluation) | `io.compprov.examples.hydrology` | River discharge modelling | List-based tracked operations |
---
### Net Asset Value (NAV)
**`io.compprov.examples.nav`** · `NetAssetValueCalculator.calculate()`
Computes the total USD value of a multi-asset crypto portfolio (BTC, ETH, USDC positions held
across Binance, staking, and Morpho DeFi) by converting each position to USD at a spot rate
and summing the results.
The primary focus is showing **how to wrap custom domain types**. The domain model uses `Amount`
and `Rate` objects rather than raw `BigDecimal`, and the example integrates them with the
framework without modifying them — using the three-step pattern:
1. **`WrappedAmount` / `WrappedRate`** extend `AbstractWrappedVariable` and declare their
operations (`add`, `convert`, `addBulk`) as `Descriptor` constants mapped to computation lambdas.
2. **`AmountWrapper` / `RateWrapper`** implement `VariableWrapper` — the one-method factory
the framework calls to instantiate tracked variables.
3. **`NavComputationContext`** extends `DefaultComputationContext`, registers both wrappers with
the shared `ComputationEnvironment`, and exposes typed `wrap(Amount, ...)` / `wrap(Rate, ...)`
convenience overloads.
After the calculation the snapshot is serialized to JSON, then deserialized and replayed via
`NavComputationContext.environment.compute()` — verifying that the CPG is round-trip stable and the
replayed output matches the original result.
---
### Gauge Block Calibration
**`io.compprov.examples.gaugeblock`** · `GaugeBlockCalibration.calibrate()`
Reproduces the interferometric calibration of a 7 mm tungsten carbide gauge block (NRC 91A)
from the following paper, which uses this measurement as a demonstration of metrological
provenance management:
> Ryan M. White, *Provenance in the Context of Metrological Traceability*, Metrology 2025, 5(3), 52.
> DOI: [10.3390/metrology5030052](https://doi.org/10.3390/metrology5030052)
The computation chain has three stages, all tracked in the CPG:
1. **Refractive index** — the Birch–Downs modified Ciddor equation (8 tracked steps) converts
air temperature, pressure, relative humidity, CO₂ concentration, and saturation vapor
pressure into the refractive index *n* of the measurement medium.
2. **Interferometric length** — the HeNe laser vacuum wavelength (632.99 nm) divided by *n*
gives the air wavelength; the observed fringe order `m + f` gives the raw length
`L_raw = (m + f) × λ_air / 2`.
3. **Thermal correction** — the raw length is corrected to the ISO 1 reference temperature
(20 °C) using the tungsten carbide expansion coefficient α = 4.23 × 10⁻⁶ K⁻¹ from
the paper: `L_cal = L_raw / (1 + α × ΔT)`.
The deviation from the 7 mm nominal length is asserted to round to **+2 nm**, matching the
paper's reported result (expanded uncertainty U = 31 nm, k = 2).
This example uses only built-in `WrappedBigDecimal` arithmetic — no custom wrappers needed —
showing that the framework handles complex pure-scalar formula chains out of the box.
---
### Hydrological Model Evaluation
**`io.compprov.examples.hydrology`** · `MhmDischargeEvaluation.evaluateParameterSetP1()`
Evaluates the mesoscale Hydrologic Model (mHM) output against observed river discharge at the
Moselle River basin upstream of Perl (~11 500 km², Luxembourg/Germany), as described in:
> Villamar et al., *Archivist: a metadata management tool for facilitating FAIR research*,
> Scientific Data, 2025.
> DOI: [10.1038/s41597-025-04521-6](https://doi.org/10.1038/s41597-025-04521-6)
The metric is the **Kling-Gupta Efficiency (KGE)** (Gupta et al. 2009):
```
KGE = 1 − √[ (r−1)² + (α−1)² + (β−1)² ]
r = Pearson correlation = Σ(devObs · devSim) / √(Σ devObs² · Σ devSim²)
α = variability ratio = σ_sim / σ_obs
β = bias ratio = μ_sim / μ_obs
```
KGE = 1 is perfect; values below 0 indicate the model is worse than the observed mean as a
predictor. The paper reports that parameter set P₁ outperforms P₂ with scores mostly above 0.5.
The computation uses `ArrayList` with loops and `addBulk`, demonstrating the
pattern for **list-based tracked operations** where the number of time steps is dynamic.
The 8-step chain (means → deviations → squared deviations and cross products → sums → r → α
→ β → KGE) is fully recorded in the CPG, with every intermediate quantity named and traceable.
The synthetic dataset is engineered so that r = 1, β = 1, α = 0.9, giving **KGE = 0.9 exactly**,
verified by exact `BigDecimal` equality.
---
## License
Apache License 2.0 — see [LICENSE](LICENSE).