https://github.com/sky-uk/kafka-topic-loader
Reads the contents of provided Kafka topics
https://github.com/sky-uk/kafka-topic-loader
Last synced: 10 months ago
JSON representation
Reads the contents of provided Kafka topics
- Host: GitHub
- URL: https://github.com/sky-uk/kafka-topic-loader
- Owner: sky-uk
- License: bsd-3-clause
- Created: 2018-10-01T10:00:51.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2023-09-12T18:19:48.000Z (over 2 years ago)
- Last Synced: 2023-09-12T19:19:14.220Z (over 2 years ago)
- Language: Scala
- Homepage:
- Size: 348 KB
- Stars: 4
- Watchers: 62
- Forks: 1
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# kafka-topic-loader
[](https://app.travis-ci.com/sky-uk/kafka-topic-loader)
[](https://mvnrepository.com/artifact/uk.sky/kafka-topic-loader)
[](https://s01.oss.sonatype.org/content/repositories/snapshots/uk/sky/kafka-topic-loader_2.13/)
Reads the contents of provided Kafka topics, either the topics in their entirety or up until a consumer groups last committed Offset depending on which `LoadTopicStrategy` you provide.
As of version `1.3.0`, data can be loaded either from complete topics using `load` or `loadAndRun`.
Since version `1.4.0` the library is cross compiled for scala versions `2.12` and `2.13`.
Since version `1.6.0` the library is cross compiled for scala versions `2.12`, `2.13` and `3`.
Since version `2.0.0` the library is no longer cross compiles for version `2.12`, and the package has been renamed from `com.sky` to `uk.sky`.
Add the following to your `build.sbt`:
```scala
libraryDependencies += "uk.sky" %% "kafka-topic-loader" % ""
```
```scala
import uk.sky.kafka.topicloader.{LoadAll, TopicLoader}
import org.apache.kafka.common.serialization.Deserializer}
implicit val as: ActorSystem = ActorSystem()
implicit val stringDeserializer: Deserializer[String] = new StringDeserializer
val stream = TopicLoader.load[String, String](NonEmptyList.one("topic-to-load"), LoadAll)
.mapAsync(1)(_ => ??? /* store records in pekko.Actor for example */)
.runWith(Sink.ignore)
```
`loadAndRun` will load the topics, complete the `Future[Done]` from the materialised value and then carry on
running, emitting any new records that appear on the topics. An example use-case for this is a REST API that holds the
contents of a Kafka topic in memory. This kind of application doesn't need to commit offsets and can use the `Future[Done]` to determine readiness.
```scala
object Main extends App {
implicit val system = ActorSystem()
implicit val mat = ActorMaterializer()
import system.dispatcher
implicit val keyDeserializer: Deserializer[String] = new StringDeserializer
implicit val valueDeserializer: Deserializer[Array[Byte]] = new ByteArrayDeserializer
val state = new SimplifiedState
val (initialLoadingFuture, controlF): (Future[Done], Future[Consumer.Control]) =
TopicLoader
.loadAndRun[String, Array[Byte]](NonEmptyList.one("topic-to-load"))
.to(Sink.foreach(record => state.store.put(record.key, record.value)))
.run()
initialLoadingFuture.foreach(_ => state.isAppReady.set(true))
}
class SimplifiedState {
/**
* API requests may query this state
*/
val store = new ConcurrentHashMap[String, Array[Byte]]()
/**
* A readiness endpoint could be created that queries this
*/
val isAppReady = new AtomicBoolean()
}
```
## Configuration
### Topic loader
The config in [`reference.conf`](src/main/resources/reference.conf) can be overridden by providing your own `application.conf`.
By default, Pekko's `ConsumerConfig` will inherit the consumer `client.id` from the application kafka-topic-loader is running from. To separate the client id of your application and the kafka-topic-loader, provide it in your `application.conf`:
```hocon
topic-loader {
client-id = "custom-client-id"
}
```
### Pekko-kafka
You should configure the `pekko.kafka.consumer.kafka-clients.group.id` to match that of your application, e.g.:
```hocon
pekko.kafka {
consumer.kafka-clients {
bootstrap.servers = ${?KAFKA_BROKERS}
group.id = assembler-consumer-group
}
producer.kafka-clients {
bootstrap.servers = ${?KAFKA_BROKERS}
}
}
```
## Source per partition
This is deprecated in favour of a new API for partitioned loading which is coming soon.
Data can also be loaded from specific partitions using `fromPartitions`. By loading from specific partitions the topic
loader can be used by multiple application instances with separate streams per set of partitions (see [Pekko Connectors kafka](https://pekko.apache.org/docs/pekko-connectors-kafka/current/consumer.html#source-per-partition) and below).
```scala
implicit val system = ActorSystem()
val consumerSettings: ConsumerSettings[String, Long] = ???
val doBusinessLogic: ConsumerRecord[String, Long] => Future[Unit] = ???
val stream: Source[ConsumerMessage.CommittableMessage[String, Long], Consumer.Control] =
Consumer
.committablePartitionedSource(consumerSettings, Subscriptions.topics("topic-to-load"))
.flatMapConcat {
case (topicPartition, source) =>
TopicLoader
.fromPartitions(LoadAll, NonEmptyList.one(topicPartition), doBusinessLogic, new LongDeserializer())
.flatMapConcat(_ => source)
}
```