{"id":19366343,"url":"https://github.com/pwliwanow/fdb-pubsub","last_synced_at":"2025-10-09T09:16:36.127Z","repository":{"id":34294697,"uuid":"167617926","full_name":"pwliwanow/fdb-pubsub","owner":"pwliwanow","description":"Pub/Sub built on top of FoundationDB","archived":false,"fork":false,"pushed_at":"2024-08-13T02:25:42.000Z","size":73,"stargazers_count":14,"open_issues_count":25,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-02T16:11:18.419Z","etag":null,"topics":["akka-streams","foundationdb","java","publish-subscribe","pubsub","scala"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pwliwanow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-01-25T21:43:18.000Z","updated_at":"2024-01-09T19:02:39.000Z","dependencies_parsed_at":"2024-11-10T07:44:05.074Z","dependency_job_id":null,"html_url":"https://github.com/pwliwanow/fdb-pubsub","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pwliwanow%2Ffdb-pubsub","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pwliwanow%2Ffdb-pubsub/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pwliwanow%2Ffdb-pubsub/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pwliwanow%2Ffdb-pubsub/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pwliwanow","download_url":"https://codeload.github.com/pwliwanow/fdb-pubsub/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250451775,"owners_count":21432895,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["akka-streams","foundationdb","java","publish-subscribe","pubsub","scala"],"created_at":"2024-11-10T07:44:00.385Z","updated_at":"2025-10-09T09:16:31.087Z","avatar_url":"https://github.com/pwliwanow.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FDB-PubSub\n\n[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.github.pwliwanow.fdb-pubsub/pubsub_2.12/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.github.pwliwanow.fdb-pubsub/pubsub_2.12)\n[![Build Status](https://travis-ci.org/pwliwanow/fdb-pubsub.svg?branch=master)](https://travis-ci.org/pwliwanow/fdb-pubsub)\n[![codecov](https://codecov.io/gh/pwliwanow/fdb-pubsub/branch/master/graph/badge.svg)](https://codecov.io/gh/pwliwanow/fdb-pubsub)\n[![Scala Steward badge](https://img.shields.io/badge/Scala_Steward-helping-brightgreen.svg?style=flat\u0026logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA4AAAAQCAMAAAARSr4IAAAAVFBMVEUAAACHjojlOy5NWlrKzcYRKjGFjIbp293YycuLa3pYY2LSqql4f3pCUFTgSjNodYRmcXUsPD/NTTbjRS+2jomhgnzNc223cGvZS0HaSD0XLjbaSjElhIr+AAAAAXRSTlMAQObYZgAAAHlJREFUCNdNyosOwyAIhWHAQS1Vt7a77/3fcxxdmv0xwmckutAR1nkm4ggbyEcg/wWmlGLDAA3oL50xi6fk5ffZ3E2E3QfZDCcCN2YtbEWZt+Drc6u6rlqv7Uk0LdKqqr5rk2UCRXOk0vmQKGfc94nOJyQjouF9H/wCc9gECEYfONoAAAAASUVORK5CYII=)](https://scala-steward.org)\n\nFDB-PubSub is a publish subscribe layer for [FoundationDB](https://apple.github.io/foundationdb/index.html), built on top of [Akka Streams](https://doc.akka.io/docs/akka/2.5/stream/stream-introduction.html) and it provides Java and Scala API. It is inspired by Kafka.\n\n### Motivation\nGetting data from the database to publish subscribe system is [surprisingly hard](https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/). \nIt would be much simpler if developer could publish the event within the business transaction - and it's exactly what FDB-PubSub does, it supports publishing events within FoundationDB transaction.\n\n### Features:\n- Support for publishing events and committing offsets within FoundationDB transaction\n- Easily scalable and fault tolerant storage (thanks to FoundationDB)\n- Easy integration with Apache Cassandra, Apache Kafka, Elasticsearch and more (thanks to [Akka Streams](https://doc.akka.io/docs/akka/2.5/stream/stream-introduction.html) and [Alpakka](https://doc.akka.io/docs/alpakka/current/))\n- Exposed as a library, so if you already opearate FoundationDB there is no new stateful component to maintain\n\n### Overview\nConcepts and assumptions are similar to Kafka. Here is brief overview of FDB-PubSub:\n- User data can be published to _topics_\n- Topics are divided into _partitions_\n- Number of partitions is defined during topic creation\n- User data consists of key (byte array) and value (byte array)\n- Values with the same key will end up in the same partition (unless user specified different partition explicitly)\n- Records are _ordered_ within the partition\n- Within topic each record has unique _offset_ assigned. Offset is of type [Versionstamp](https://apple.github.io/foundationdb/javadoc/com/apple/foundationdb/tuple/Versionstamp.html)\n- Each consumer belongs to a _consumer group_\n- Records within a partition will be processed in order\n- Records for consumers with the same consumer groups will be load balanced (e.g. given topic with 10 partitions if at first there is a single consumer `A` that process data from all partitions, and consumer `B` joins, consumer `B` will take over processing data from 5 partitions and consumer `A` will continue processing data from the other 5 partitions)\n- Consumers keep track of its position by storing last processed offset from each partition (i.e. offset is being stored for every tuple `(topic, consumerGroup, partitionNumber)`)\n\n### Current limitations\n- It's not yet production ready. If you'd like to use it in your project, please get it touch, I will be happy to help.\n- To stream data from a partition consumers acquire locks (those locks are later used to atomically commit offset; outdated locks fail to commit an offset and underlying partition stream is stopped). Currently, those locks are being acquired rather aggressively by consumers that joined: consumer that held a lock is not aware about the fact that other consumer wants to join, and newly connected consumer simply acquires some of the locks that were held by others. \nAs the result, when processing data using at least once delivery semantics, it causes messages to be processed more times that it would be necessary if locks were acquired gracefully. It will be addressed in future releases.\n- No performance tests were performed as of now. Currently - with default settings - having up to 10 consumers and 1000 partitions per topic should be perfectly fine.\n\n### Disclaimer\nIt's not well-tested on production. I'd recommend using Kafka or Pulsar instead. \n\n# Quickstart\nFDB-PubSub provides both Java and Scala API. Java API is present in package `com.github.pwliwanow.fdb.pubsub.javadsl` and Scala API is present in `com.github.pwliwanow.fdb.pubsub.scaladsl`. Module `example` contains small examples written in Java and in Scala.\n\n### Dependency\nTo get started with FDB-PubSub add the following dependency with SBT:\n```scala\nval fdbPubSubVersion = \"0.2.0\"\nlibraryDependencies += \"com.github.pwliwanow.fdb-pubsub\" %% \"pubsub\" % fdbPubSubVersion\n```\nor Maven:\n```xml\n\u003cdependency\u003e\n  \u003cgroupId\u003ecom.github.pwliwanow.fdb-pubsub\u003c/groupId\u003e\n  \u003cartifactId\u003epubsub_2.12\u003c/artifactId\u003e\n  \u003cversion\u003e0.2.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n### PubSubClient\nTo start you need to create a `PubSubClient`, it's an immutable class that can be freely shared within the application. It allows user to create a topic, get producer and create a consumer. `PubSubClient` requires a `Subspace` that it will operate on and a `Database` to be provided:\n```scala\n// Scala\nimport com.github.pwliwanow.fdb.pubsub.scaladsl.PubSubClient\n\nval system = ActorSystem()\nimplicit ec = system.dispacher()\nval db = FDB.selectAPIVersion(620).open(null, ec)\nval pubSubSubspace = new Subspace(Tuple.from(\"PubSubExample\"))\nval pubSubClient = PubSubClient(pubSubSubspace, database)\n```\n\n```java\n// Java\nimport com.github.pwliwanow.fdb.pubsub.javadsl.PubSubClient;\n\nActorSystem system = ActorSystem.create();\nExecutionContextExecutor ec = system.dispatcher();\nDatabase db = FDB.selectAPIVersion(600).open(null, ec);\nSubspace pubSubSubspace = new Subspace(Tuple.from(\"PubSubExample\"));\nPubSubClient pubSubClient = PubSubClient.create(pubSubSubspace, db);\n```\n\n### Creating a topic\n```scala\n// Scala\nval futureResult: Future[NotUsed] = pubSubClient.createTopic(\"testTopic\", numberOfPartitions = 10)\n```\n```java\n// Java\nint numberOfPartitions = 10;\nCompletableFuture\u003cNotUsed\u003e futureResult = pubSubClient.createTopic(\"testTopic\", numberOfPartitions, ec);\n```\n\n### Using a producer\nProducers from Java and Scala API differ in how they compose transactions. Java API takes additional `TransactionContext` as a parameter and returns `CompletableFuture\u003cNotUsed\u003e`, and Scala API returns `DBIO[NotUsed]` (which is a type from [foundationdb4s](https://github.com/pwliwanow/foundationdb4s)).\n\n```scala\n// Scala\nval producer = pubSubClient.producer\nval dbio = \n  producer.send(\n    \"testTopic\", \n    Tuple.from(\"ExampleKey\").pack(), \n    Tuple.from(\"ExampleValue\").pack())\nval futureResult: Future[NotUsed] = dbio.transact(database);\n```\n```java\n// Java\nProducer producer = pubSubClient.producer();\nCompletableFuture\u003cNotUsed\u003e futureResult = \n  database.runAsync(tx -\u003e \n    producer.send(\n      tx, \n      \"testTopic\", \n      Tuple.from(\"ExampleKey\").pack(), \n      Tuple.from(\"ExampleValue\").pack()));\n```\n\n### Creating a consumer\nConsumers are exposed as [substreams](https://doc.akka.io/docs/akka/2.5/stream/stream-substream.html), where each partition forms a separate stream \n(which is especially useful during committing offsets, when each stream may perform commit action independently).\nTo create a consumer _topic_, _consumerGroup_, _ConsumerSettings_ and _Materializer_ need to be provided:\n```scala\n// Scala\nimplicit val mat = ActorMaterializer()\nval defaultSettings = ConsumerSettings()\nval consumer = pubSubClient.consumer(\"testTopic\", \"testConsumerGroup\", defaultSettings)\n```\n```java\n// Java\nMaterializer mat = ActorMaterializer.create();\nConsumerSettings defaultSettings = ConsumerSettings.create();\nSubSource\u003cConsumerRecord\u003cKeyValue\u003e, NotUsed\u003e consumer = \n  pubSubClient.consumer(\"testTopic\", \"testConsumerGroup\", defaultSettings, mat);\n```\n\n### Committing offsets\nFDB-PubSub offers `committableFlow` and `committableSink` that should be used for committing offsets. It's guaranteed to be run exactly once for each `ConsumerRecord`. \nOptionally, user can add custom transactional logic.\n```scala\n// Scala\ndef transactionToCompose(record: ConsumerRecord[KeyValue]): DBIO[Unit] = {\n  // some implementation\n}\nval runnableGraph = consumer.to(Consumer.committableSink(database, transactionToCompose))\n```\n```java\n// Java\nRunnableGraph\u003cNotUsed\u003e runnableGraph =\n  consumer.to(Consumer.committableSink(database, (tx, record) -\u003e performTransaction(tx, record), ec));\n\nCompletableFuture\u003cVoid\u003e performTransaction(TransactionContext tx, ConsumerRecord\u003cKeyValue\u003e record) {\n  // some implementation\n}\n```\n\n### Running the consumer\n```scala\n// Scala\nrunnableGraph.run()\n```\n```java\n// Java\nrunnableGraph.run(mat);\n```\n\n## Semantics\nDepending on the use case, different processing semantics may become useful\n\n### Exactly once\nExactly once processing is only possible within FoundationDB by using `committableFlow` or `committableSink`, as shown in _Committing offsets_ section.\n\n### At least once\nTo process data at least once, additional processing should be done before offset is committed. \nDepending on your use case it may be a good idea to commit offsets in batch:\n```scala\n// Scala\ndef updateElasticsearchInBatch(records: Seq[ConsumerRecord[Entity]]): Future[Unit] = {\n  // implementation here\n}\n\nconsumer\n  .groupedWithin(1000, 5.seconds)\n  // at the end get only the last record to perform batch commit\n  .mapAsync(1)(records =\u003e updateElasticsearchInBatch(records).map(_ =\u003e records.last))\n  .via(Consumer.commitableFlow(database))\n```\n```java\n// Java\nconsumer\n  .groupedWithin(1000, Duration.of(5, ChronoUnit.SECONDS))\n  // at the end get only the last record to perform batch commit\n  .mapAsync(1, records -\u003e updateElasticsearchInBatch(records).thenApply(() -\u003e records.get(records.size() - 1)))\n  .via(Consumer.commitableFlow(database));\n\nCompletionStage\u003cVoid\u003e updateElasticsearchInBatch(List\u003cConsumerRecord\u003cEntity\u003e\u003e records) {\n  // implementation here\n}\n```\n\n### At most once\nTo process data at most once, addtional processing should be done after offset is committed:\n```scala\n// Scala\ndef sendNotCriticalNotification(record: ConsumerRecord[Entity]): Future[Unit] = {\n  // implementation here\n}\nval parallelism = 10\nconsumer\n  .via(Consumer.commitableFlow(database))\n  .mapAsync(parallelism)(sendNotCriticalNotification)\n  .to(Sink.ignore)\n  .run()\n```\n```java\n// Java\nint parallelism = 10;\nconsumer\n  .via(Consumer.commitableFlow(database))\n  .mapAsync(parallelism, this::sendNotCriticalNotification)\n  .to(Sink.ignore)\n  .run(mat);\n\nCompletionStage\u003cVoid\u003e sendNotCriticalNotification(ConsumerRecord\u003cEntity\u003e record) {\n  // implementation here\n}\n```\n\n### Cleaning data from topics\nTo enable cleaning data from topics three basic methods are provided: `clear`, `getOffset` and `getPartitions`.\nThose methods can be combined as follows:\n```scala\n// Scala\nimport cats.implicits._\nval topic = \"products\"\nval consumerGroups = List(\"cg1\", \"cg2\")\nval dbio = for {\n  partitions \u003c- pubSubClient.getPartitions(topic).toDBIO\n  consumerGroupPartitions = partitions.flatMap(p =\u003e consumerGroups.map(c =\u003e (p, c)))\n  _ \u003c- consumerGroupPartitions.parTraverse { case (p, c) =\u003e\n    pubSubClient\n      .getOffset(topic, c, p)\n      .toDBIO\n      .flatMap(_.fold(DBIO.unit)(offset =\u003e pubSubClient.clear(topic, p, offset)))\n  }\n} yield ()\ndbio.transact(database)\n```\n```java\n// Java, for simplicity example below is blocking\nString topic = \"products\";\nList\u003cString\u003e consumerGroups = Arrays.asList(\"cg1\", \"cg2\");\ndb.run((Transaction tr) -\u003e {\n    List\u003cInt\u003e partitions = pubSubClient.getPartitions(tr, topic).join();\n    for (String c: consumerGroups)\n        for (int p: partitions) {\n            pubSubClient.getOffset(tr, topic, p, c).join().ifPresent(offset -\u003e {\n                pubSubClient.clear(tr, topic, p, offset).join();\n            });\n        }\n});\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpwliwanow%2Ffdb-pubsub","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpwliwanow%2Ffdb-pubsub","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpwliwanow%2Ffdb-pubsub/lists"}