https://github.com/forter/storm-data-contracts
Storm bolts for validating input and output data
https://github.com/forter/storm-data-contracts
Last synced: about 1 year ago
JSON representation
Storm bolts for validating input and output data
- Host: GitHub
- URL: https://github.com/forter/storm-data-contracts
- Owner: forter
- License: apache-2.0
- Created: 2014-08-22T12:09:59.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2023-03-21T19:16:08.000Z (about 3 years ago)
- Last Synced: 2025-04-04T17:22:04.144Z (about 1 year ago)
- Language: Java
- Size: 231 KB
- Stars: 1
- Watchers: 55
- Forks: 5
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
storm-data-contracts
====================
[](https://travis-ci.org/forter/storm-data-contracts/)
This project lets you write Storm Bolts in Java with strict data contracts:
Strongly Typed
--------------
Bolt input and output are POJOs
```java
public class MyBolt implements IContractsBolt> {
@Override
public Collection execute(MyBoltInput input) {
MyBoltOutput output = new MyBoltOutput();
if (input.y.isPresent()) {
output.z = input.y.get() + input.x;
}
else {
output.z = "default" + input.x;
}
return Lists.newArrayList(output);
}
@Override
public Collection createDefaultOutput() {
return Lists.newArrayList();
}
}
```
Input and Output Data Contracts
-------------------------------
Support Guava Optional and Hibernate Validator for strict data contracts
```java
public class MyBoltInput {
@NotNull
@Min(0)
public Integer x;
@UnwrapValidatedValue
@Pattern(regexp="\\p{L}*")
public Optional y;
}
public class MyBoltOutput {
@NotNull
public String z;
}
```
Exceptions
----------
* All input contract violations are reported to storm.
* All #execute() exceptions are reported to storm.
* All output contract violations are reported to storm, and the default output is emitted instead.
Caching
-------
BaseContractsBoltExecutor supports adding a caching mechanism via inheritance and overriding of
BaseContractsBoltExecutor#createCacheDAO.
Cached input contracts should be annotated with @Cached annotation and fields which are used as cache keys should be
annotated with @CacheKey
```java
@Cached
public class Input {
@Max(10)
@NotNull
@CacheKey
public Integer input1;
@Max(10)
@UnwrapValidatedValue
public Optional optionalInput2;
}
public class MyCacheDAO implements CacheDAO {
public Map, TOutput> cache = new HashMap<>();
@Override
public Optional get(Map input) {
if (cache.containsKey(input)) {
return Optional.of(cache.get(input));
}
return Optional.absent();
}
@Override
public void save(TOutput output, Map inputKey, long startTimeMillis) {
cache.put(inputKey, output);
}
}
public class MyCachedContractBoltExecutor>
extends BaseContractsBoltExecutor {
@Override
protected CacheDAO createCacheDAO(Map stormConf, TopologyContext context) {
return new MyCacheDAO();
}
}
```
@CacheKey supports transformation of input for cache purposes (without changing the input the bolt receives in case of
cache miss). For example:
```java
@Cached
public class Input {
@Max(10)
@NotNull
@CacheKey(transformers = {LowerCaseTransformer.class})
public String input1;
}
public class LowerCaseTransformer implements CacheKeyTransformer {
public Object transform(Object key) {
return ((String) key).toLowerCase();
}
}
```
CSV driven unit tests
---------------------
CSV file header is used to inject data into MyBoltInput and expected MyBoltOutput during unit tests
*src/test/resources/MyTest.csv*
```
input.x,input.y,output.z
1,prefix,prefix1
2,__NULL__,default2
```
*src/test/java/MyTest.java*
```java
public class MyBoltTest {
private MyBolt bolt;
@BeforeClass
public void before() {
bolt = new MyBolt();
bolt.prepare(mock(Map.class),mock(TopologyContext.class));
}
@AfterClass
public void after() {
bolt.cleanup();
}
//reads from src/main/resources/MyBoltTest.csv
@Test(dataProviderClass=TestDataProvider.class, dataProvider="csv")
public void testExecute(MyBoltInput input, MyBoltOutput expectedOutput) {
Collection outputs = bolt.execute(input);
MyBoltOutput output = Iterables.getOnlyElement(outputs);
assertReflectionEquals(expectedOutput, output);
}
@Test
public void testDefaultOutput() {
assertTrue(ContractValidator.instance().validate(bolt.createDefaultOutput()).isValid());
}
}
```
Adding Bolt into a Topology
---------------------------
```java
TopologyBuilder builder = new TopologBuilder();
builder.setBolt("myContractsBolt",new BaseContractsBoltExecutor(new MyContractsBolt()))
```
**input**
Bolt expects a pair tuple (such as [id, data]).
The second item of the pair is expected to be one of the following:
* `MyBoltInput` - the expected input type, will be validated by the bolt.
* `ObjectNode` - a weakly typed object (Jackson parsed JSON object similar to Map). Converted to MyBoltInput and validated.
* `Map` or `SomeOtherBoltInput` - converted into an `ObjectNode` and then converted into MyBoltInput and validated.
This behavior can be modified by overriding the BaseContractsBoltExecutor#transformInput() method.
**output**
The bolt emits a pair tuple (such as [id, data]).
The second item of the pair is a MyBoltOutput`
This behavior can be modified by overriding the BaseContractsBoltExecutor#transformOutput() method:
```java
public class ToMapContractsBoltExecutor> extends BaseContractsBoltExecutor {
public ToMapContractsBoltExecutor(TContractsBolt contractsBolt) {
super(contractsBolt);
}
@Override
protected Object transformOutput(Object output) {
return ContractConverter.instance().convertContractToMap(output);
}
}
```
Enrichment Bolts
-----
Normally, contract bolts will "absorb" any attribute that passes by them. This means that the only attributes available to any bolt connected after a contract bolt will be the attributes specified in the output of that contract bolt.
One way around this is doing an old-fashioned join, but this because very hard to maintain if dealing with a large topology.
A quick solution around this is the use of the `@EnrichmentBolt` annotation, which will indicate to the ContractBoltExecutor that this bolt is in "upsert" mode to the attributes map: it will only append (or update, if already existent) to it and will let the other attributes bypass it for the next bolts to use.
```java
@EnrichmentBolt
public class MyEnrichmentBolt extends BaseContractBolt {
// This bolt will allow attributes not in its input/output pass right through it
....
}
```
Maven
-----
```
com.forter
storm-data-contracts
0.2
compile
javax.validation
validation-api
1.1.0.Final
org.hibernate
hibernate-validator
5.1.2.Final
com.forter
storm-data-contracts-testng
0.2
test
org.unitils
unitils-core
test
forter-public
forter public
http://oss.forter.com/repository
fail
fail
true
```