Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/camathieu/storm-asynchbase

AsyncHBase storm mapper
https://github.com/camathieu/storm-asynchbase
Last synced: 20 days ago
JSON representation
AsyncHBase storm mapper
Host: GitHub
URL: https://github.com/camathieu/storm-asynchbase
Owner: camathieu
Created: 2015-05-06T07:59:23.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2015-05-06T08:00:02.000Z (over 9 years ago)
Last Synced: 2024-10-16T05:35:07.180Z (2 months ago)
Language: Java
Size: 313 KB
Stars: 0
Watchers: 2
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        Storm connector for HBase using AsyncHBase client

=================================================

This connector for Apache Storm use AsyncHBase client to 

persist raw data and  Trident states to Apache HBase.

 

Benefits

--------

AyncHBase client a is fully asynchronous and thread-safe client

for Apache HBase. Unlike the traditional HBase client (HTable)

you only need one instance of the client per HBase cluster you

want to interact with. Even if you want to interact with

multiple tables. It avoid unnecessary waiting threads and

allow batch of requests to run in parallel even in synchronous

mode.  

The client provide client side buffering, which may induce

some latency in your topology. You might tweak the default 100ms

flush interval if you care about latency. This connector

tries to provide flexibility and high performance, the non-blocking

fashion should reduce pressure on both Storm and HBase.

I suggest you to read the javadoc for all more detailed information.

http://javadoc.root.gg/storm-asynchbase

Usage

-----

Usage example can be found in the storm.asynchbase.example package

 * Client configuration  

You have to register a configuration Map in the topology Config for

each client you want to use. 

```

    Map hBaseConfig = new HashMap<>();

    hBaseConfig.put("zkQuorum", "node1,node2,node3");

    hBaseConfig.put("zkPath", "/hbase");

    conf.put("hbase-cluster", hBaseConfig);

```        

 * Mapper  

To map Storm tuple to HBase RPC requests you'll have to provide

some mappers to the bolts, function, states.

You'll be able to use method chaining syntax to configure them.

You can either map a query parameter to a tuple field or to a

fixed constant value. You can also provide serializers to

format input values to a specific type.

```

    IAsyncHBaseMapper mapper = new AsyncHBaseMapper()

                .addFieldMapper(new AsyncHBaseFieldMapper()

                        .setTable("test")

                        .setRowKeyField("key")

                        .setColumnFamily("data")

                        .setColumnQualifierField("value")

                        .setColumnQualifierSerializer(new IntegerSerializer())

                        .setValue(payload)

                );

```

 * Bolts  

AsyncHBaseBolt is used to execute requests for each incoming tuple, it use

on or more FieldMapper to build the requests from the tuple's fields. All 

requests executed from a tuple are executed in parallel. By default this 

bolt is asynchronous, be sure to read the doc to fully understand what it

does.

```

    builder.setBolt(

            "hbase-bolt",

            new AsyncHBaseBolt("hbase-cluster", mapper),

            5).noneGrouping("spout");

```

ExtractKeyValuesBolt is used to extract cells values returned by a GetRequests

to some tuples containing the cell property as field values. You can choose

which property you want to retrieve with the boolean flags passed to the

constructor. You can also provide deserializer to map property to a 

specific type.

```

    builder.setBolt("extract-bolt",

        new ExtractKeyValues(false, false, false, true, false) // return only the value

        .setValueDeserializer(new IntSerializer())

```

 * Trident Operations  

ExecuteHBaseRpcs is a Trident function that mimic the behaviour of the

AsyncHBaseBolt in a trident topology.

```

    .each(new Fields("args"), new ExecuteHBaseRpcs("hbase-cluster", mapper, "get values").setAsync(false), new Fields("values"))    

```

ExtractKeyValues is a Trident function that mimic the behaviour of the

ExtractKeyValuesBolt in a trident topology.

```

    .each(

        new Fields("values"),

        new ExtractKeyValues(false, false, false, true, false)

            .setValueDeserializer(new IntSerializer()),

        new Fields("value"))

```

 * Trident State  

This is a TridentState implementation to persist a partition to HBase using AsyncHBase client.

It should be used with the partition persist method.

You should only use this state if your update is idempotent regarding batch replay. Use the

AsyncHBaseStateUpdater / AsyncHBaseStateQuery and AsyncHBaseStateFactory to interact with it.

You have to provide mappers for update and(or) query function.

```

    AsyncHBaseState.Options streamRateOptions = new AsyncHBaseState.Options();

    streamRateOptions.cluster = "hbase-cluster";

```

```

    streamRateOptions.updateMapper = new AsyncHBaseTridentFieldMapper()

         .setTable("test")

         .setColumnFamily("data")

         .setColumnQualifier("stream rate")

         .setRowKey("global rate")

         .setValueField("rate")

         .setValueSerializer(new AsyncHBaseLongSerializer());

```

```

    streamRateOptions.queryMapper = new AsyncHBaseTridentFieldMapper()

         .setRpcType(IAsyncHBaseTridentFieldMapper.Type.GET)

         .setTable("test")

         .setColumnFamily("data")

         .setColumnQualifier("stream rate")

         .setRowKey("global rate");

```

```

    TridentState streamRate = stream

            .aggregate(new Fields(), new StreamRateAggregator(2), new Fields("rate"))

            .partitionPersist(

                new AsyncHBaseStateFactory(streamRateOptions),

                new Fields("rate"),

                new AsyncHBaseStateUpdater()

            );

```

```

    topology.newDRPCStream("stream rate drpc", drpc)

        .stateQuery(

            streamRate,

            new AsyncHBaseStateQuery()

                .setValueDeserializer(new AsyncHBaseLongSerializer()),

            new Fields("rate"))

```

 * Trident MapState

This is a Trident State implementation backed by HBase using AsyncHBase client.

It can provide an LRU Cache to speed up multiget. It can handle both 

NON-TRANSACTIONAL, TRANSACTIONAL and OPAQUE modes.

```

    AsyncHBaseMapState.Options sumStateOptions = new AsyncHBaseMapState.Options();

        sumStateOptions.cluster = "hbase-cluster";

        sumStateOptions.table = "test";

        sumStateOptions.columnFamily = "data";

        sumStateOptions.columnQualifier = "total";

    TridentState sumState = stream

                .groupBy(new Fields("key"))

                .persistentAggregate(

                    AsyncHBaseMapState.transactional(sumStateOptions),

                    new Fields("value"),

                    new Sum(),

                    new Fields("sum"))

                    .parallelismHint(10);

```

TODO

----

 * Handle scan requests to get data from HBase to Storm