Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/redplanetlabs/rama-kafka

Use Kafka as a depot or publishing target in Rama modules
https://github.com/redplanetlabs/rama-kafka

Last synced: 3 months ago
JSON representation

Use Kafka as a depot or publishing target in Rama modules

Host: GitHub
URL: https://github.com/redplanetlabs/rama-kafka
Owner: redplanetlabs
License: apache-2.0
Created: 2023-08-20T03:33:13.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-04-27T00:02:03.000Z (10 months ago)
Last Synced: 2024-05-09T21:27:55.172Z (9 months ago)
Language: Java
Size: 16.6 KB
Stars: 21
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # rama-kafka

This library integrates Rama with external [Apache Kafka](https://kafka.apache.org/) clusters. It enables Kafka to be used as a source for Rama topologies and makes it easy to publish records to Kafka topics from topologies.

## Maven

`rama-kafka` is available in the following Nexus repository:

```

  nexus-releases

  https://nexus.redplanetlabs.com/repository/maven-public-releases

```

The latest release is:

```

  com.rpl

  rama-kafka

  0.9.0

```

## Usage

General information about integrating Rama with external systems can be found [on this page](https://redplanetlabs.com/docs/~/integrating.html).

Here's an example of using `rama-kafka` to consume from one Kafka topic and publish to another Kafka topic:

```java

public class KafkaIntegrationExampleModule implements RamaModule {

  @Override

  public void define(Setup setup, Topologies topologies) {

    Map kafkaConfig = new HashMap<>();

    kafkaConfig.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka1.mycompany.com:9092,kafka2.mycompany.com:9092");

    kafkaConfig.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());

    kafkaConfig.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

    kafkaConfig.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName());

    kafkaConfig.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName());

    setup.declareObject("*kafka", new KafkaExternalDepot(kafkaConfig, "myTopic"));

    StreamTopology s = topologies.stream("s");

    s.source("*kafka").out("*tuple")

     .each(Ops.EXPAND, "*tuple").out("*key", "*value")

     .each((String k, Long v) -> new ProducerRecord("anotherTopic", k, v * 8),

           "*key", "*value").out("*producerRecord")

     .eachAsync(new KafkaAppend(), "*kafka", "*producerRecord");

  }

}

```

A topic on a remote Kafka cluster is declared in a Rama module by creating a `KafkaExternalDepot` and passing it to a `declareObject` call. In this example the topic `"myTopic"` on a Kafka cluster is associated with the var `"*kafka"`. `KafkaExternalDepot` is parameterized with a map of configs, the same way you would create a [KafkaConsumer](https://javadoc.io/static/org.apache.kafka/kafka-clients/3.2.3/org/apache/kafka/clients/consumer/KafkaConsumer.html) or [KafkaProducer](https://javadoc.io/static/org.apache.kafka/kafka-clients/3.2.3/org/apache/kafka/clients/producer/KafkaProducer.html). The configs accepted are the same.

`KafkaExternalDepot` uses the `"bootstrap.servers"` config to identify a Kafka cluster. Internally it will only make one `KafkaConsumer` client per task thread per Kafka cluster. So if you are consuming multiple Kafka topics from the same cluster in the same module, as long as each `KafkaExternalDepot` is declared with the same `"bootstrap.servers"` config only one `KafkaConsumer` will be created for that cluster per task thread.

The configs `"enable.auto.commit"` and `"group.id"` are not allowed to be specified as configs because Rama handles all offset management.

Serialization/deserialization to and from a Kafka topic is handled by Kafka, and you can specify the serializations to use via the config object.

When used as a source for a topology, a `KafkaExternalDepot` emits two-element lists containing the key and value of each record. As shown here, `Ops.EXPAND` is useful for extracting the key and value from each emitted list.  There's no difference in functionality between using `KafkaExternalDepot` as a source versus a built-in depot (e.g. all "start from" options are supported).

### Publishing to a Kafka topic

The `KafkaAppend` function publishes records to a Kafka topic from within a topology. It takes as input a `KafkaExternalDepot` and [ProducerRecord](https://javadoc.io/static/org.apache.kafka/kafka-clients/3.2.3/org/apache/kafka/clients/producer/ProducerRecord.html). The `ProducerRecord` contains the topic and partition information for publishing as well as the key and value of the record.

`KafkaAppend` should be used with `eachAsync` and emits the [RecordMetadata](https://javadoc.io/static/org.apache.kafka/kafka-clients/3.2.3/org/apache/kafka/clients/producer/RecordMetadata.html) returned by Kafka.

By default `KafkaAppend` in an `eachAsync` doesn't emit until Kafka has returned the `RecordMetadata`. This ties success of the topology doing the append with success of the record being published to Kafka. So if the Kafka append fails, the topology will retry. If you don't care about this and prefer an at-most once append guarantee, you can parameterize the `KafkaAppend` constructor to tell it to emit without waiting to hear back from Kafka. For example:

```java

.eachAsync(new KafkaAppend(false), "*kafka", "*producerRecord")

```

In this case the `eachAsync` call will emit `null` and the topology will succeed without waiting for Kafka to acknowledge the append.