An open API service indexing awesome lists of open source software.

https://github.com/forter/storm-data-contracts

Storm bolts for validating input and output data
https://github.com/forter/storm-data-contracts

Last synced: about 1 year ago
JSON representation

Storm bolts for validating input and output data

Awesome Lists containing this project

README

          

storm-data-contracts
====================

[![Build Status](https://img.shields.io/travis/forter/storm-data-contracts.svg)](https://travis-ci.org/forter/storm-data-contracts/)

This project lets you write Storm Bolts in Java with strict data contracts:

Strongly Typed
--------------
Bolt input and output are POJOs
```java
public class MyBolt implements IContractsBolt> {
@Override
public Collection execute(MyBoltInput input) {
MyBoltOutput output = new MyBoltOutput();
if (input.y.isPresent()) {
output.z = input.y.get() + input.x;
}
else {
output.z = "default" + input.x;
}
return Lists.newArrayList(output);
}

@Override
public Collection createDefaultOutput() {
return Lists.newArrayList();
}
}
```

Input and Output Data Contracts
-------------------------------
Support Guava Optional and Hibernate Validator for strict data contracts
```java
public class MyBoltInput {
@NotNull
@Min(0)
public Integer x;

@UnwrapValidatedValue
@Pattern(regexp="\\p{L}*")
public Optional y;
}

public class MyBoltOutput {
@NotNull
public String z;
}
```

Exceptions
----------
* All input contract violations are reported to storm.
* All #execute() exceptions are reported to storm.
* All output contract violations are reported to storm, and the default output is emitted instead.

Caching
-------
BaseContractsBoltExecutor supports adding a caching mechanism via inheritance and overriding of
BaseContractsBoltExecutor#createCacheDAO.
Cached input contracts should be annotated with @Cached annotation and fields which are used as cache keys should be
annotated with @CacheKey
```java
@Cached
public class Input {

@Max(10)
@NotNull
@CacheKey
public Integer input1;

@Max(10)
@UnwrapValidatedValue
public Optional optionalInput2;
}

public class MyCacheDAO implements CacheDAO {

public Map, TOutput> cache = new HashMap<>();

@Override
public Optional get(Map input) {
if (cache.containsKey(input)) {
return Optional.of(cache.get(input));
}
return Optional.absent();
}

@Override
public void save(TOutput output, Map inputKey, long startTimeMillis) {
cache.put(inputKey, output);
}
}

public class MyCachedContractBoltExecutor>
extends BaseContractsBoltExecutor {

@Override
protected CacheDAO createCacheDAO(Map stormConf, TopologyContext context) {
return new MyCacheDAO();
}

}
```
@CacheKey supports transformation of input for cache purposes (without changing the input the bolt receives in case of
cache miss). For example:
```java
@Cached
public class Input {

@Max(10)
@NotNull
@CacheKey(transformers = {LowerCaseTransformer.class})
public String input1;
}

public class LowerCaseTransformer implements CacheKeyTransformer {

public Object transform(Object key) {
return ((String) key).toLowerCase();
}
}

```

CSV driven unit tests
---------------------
CSV file header is used to inject data into MyBoltInput and expected MyBoltOutput during unit tests

*src/test/resources/MyTest.csv*

```
input.x,input.y,output.z
1,prefix,prefix1
2,__NULL__,default2
```

*src/test/java/MyTest.java*

```java
public class MyBoltTest {

private MyBolt bolt;

@BeforeClass
public void before() {
bolt = new MyBolt();
bolt.prepare(mock(Map.class),mock(TopologyContext.class));
}

@AfterClass
public void after() {
bolt.cleanup();
}

//reads from src/main/resources/MyBoltTest.csv
@Test(dataProviderClass=TestDataProvider.class, dataProvider="csv")
public void testExecute(MyBoltInput input, MyBoltOutput expectedOutput) {
Collection outputs = bolt.execute(input);
MyBoltOutput output = Iterables.getOnlyElement(outputs);
assertReflectionEquals(expectedOutput, output);
}

@Test
public void testDefaultOutput() {
assertTrue(ContractValidator.instance().validate(bolt.createDefaultOutput()).isValid());
}
}
```

Adding Bolt into a Topology
---------------------------
```java
TopologyBuilder builder = new TopologBuilder();
builder.setBolt("myContractsBolt",new BaseContractsBoltExecutor(new MyContractsBolt()))

```

**input**

Bolt expects a pair tuple (such as [id, data]).
The second item of the pair is expected to be one of the following:
* `MyBoltInput` - the expected input type, will be validated by the bolt.
* `ObjectNode` - a weakly typed object (Jackson parsed JSON object similar to Map). Converted to MyBoltInput and validated.
* `Map` or `SomeOtherBoltInput` - converted into an `ObjectNode` and then converted into MyBoltInput and validated.

This behavior can be modified by overriding the BaseContractsBoltExecutor#transformInput() method.

**output**

The bolt emits a pair tuple (such as [id, data]).
The second item of the pair is a MyBoltOutput`

This behavior can be modified by overriding the BaseContractsBoltExecutor#transformOutput() method:
```java
public class ToMapContractsBoltExecutor> extends BaseContractsBoltExecutor {

public ToMapContractsBoltExecutor(TContractsBolt contractsBolt) {
super(contractsBolt);
}

@Override
protected Object transformOutput(Object output) {
return ContractConverter.instance().convertContractToMap(output);
}
}
```

Enrichment Bolts
-----
Normally, contract bolts will "absorb" any attribute that passes by them. This means that the only attributes available to any bolt connected after a contract bolt will be the attributes specified in the output of that contract bolt.
One way around this is doing an old-fashioned join, but this because very hard to maintain if dealing with a large topology.
A quick solution around this is the use of the `@EnrichmentBolt` annotation, which will indicate to the ContractBoltExecutor that this bolt is in "upsert" mode to the attributes map: it will only append (or update, if already existent) to it and will let the other attributes bypass it for the next bolts to use.
```java
@EnrichmentBolt
public class MyEnrichmentBolt extends BaseContractBolt {
// This bolt will allow attributes not in its input/output pass right through it
....
}
```

Maven
-----
```


com.forter
storm-data-contracts
0.2
compile



javax.validation
validation-api
1.1.0.Final


org.hibernate
hibernate-validator
5.1.2.Final



com.forter
storm-data-contracts-testng
0.2
test


org.unitils
unitils-core
test




forter-public
forter public
http://oss.forter.com/repository

fail


fail
true



```