{"id":15056840,"url":"https://github.com/thriving-dev/kafka-streams-cassandra-state-store","last_synced_at":"2025-04-10T04:35:05.084Z","repository":{"id":90536522,"uuid":"582791851","full_name":"thriving-dev/kafka-streams-cassandra-state-store","owner":"thriving-dev","description":"'Drop-in' Kafka Streams State Store implementation that persists data to Apache Cassandra / ScyllaDB","archived":false,"fork":false,"pushed_at":"2025-02-12T12:45:36.000Z","size":1340,"stargazers_count":24,"open_issues_count":18,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-12T13:56:07.773Z","etag":null,"topics":["apache-cassandra","cassandra","java","kafka-streams","stream-processing"],"latest_commit_sha":null,"homepage":"https://thriving.dev/blog/introducing-kafka-streams-cassandra-state-store","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thriving-dev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-27T22:03:14.000Z","updated_at":"2024-09-23T07:43:40.000Z","dependencies_parsed_at":"2023-12-25T23:32:38.078Z","dependency_job_id":"117a3199-a6dd-40ac-80d3-dd3ca0b69c62","html_url":"https://github.com/thriving-dev/kafka-streams-cassandra-state-store","commit_stats":{"total_commits":309,"total_committers":6,"mean_commits":51.5,"dds":0.6310679611650485,"last_synced_commit":"58ce7fcf8f8ee8472a303aab6b165c23c07311a8"},"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thriving-dev%2Fkafka-streams-cassandra-state-store","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thriving-dev%2Fkafka-streams-cassandra-state-store/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thriving-dev%2Fkafka-streams-cassandra-state-store/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thriving-dev%2Fkafka-streams-cassandra-state-store/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thriving-dev","download_url":"https://codeload.github.com/thriving-dev/kafka-streams-cassandra-state-store/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239140109,"owners_count":19588338,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-cassandra","cassandra","java","kafka-streams","stream-processing"],"created_at":"2024-09-24T21:56:53.998Z","updated_at":"2025-02-16T14:31:13.493Z","avatar_url":"https://github.com/thriving-dev.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# kafka-streams-cassandra-state-store\n\n[![Use this template](https://img.shields.io/badge/from-java--library--template-brightgreen?logo=dropbox)](https://github.com/thriving-dev/java-library-template/generate)\n[![Java CI](https://github.com/thriving-dev/kafka-streams-cassandra-state-store/actions/workflows/1.pipeline.yml/badge.svg)](https://github.com/thriving-dev/kafka-streams-cassandra-state-store/actions/workflows/1.pipeline.yml)\n[![Maven Central](https://img.shields.io/maven-central/v/dev.thriving.oss/kafka-streams-cassandra-state-store.svg)](https://central.sonatype.com/artifact/dev.thriving.oss/kafka-streams-cassandra-state-store)\n[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](CODE_OF_CONDUCT.md)\n[![Javadoc](https://img.shields.io/badge/JavaDoc-Online-green)](https://thriving-dev.github.io/kafka-streams-cassandra-state-store/javadoc/)\n\n## Overview\nKafka Streams State Store implementation that persists data to Apache Cassandra.\nFor now, only KeyValueStore type is supported.\n\n!['Drop-in' Kafka Streams State Store implementation that persists data to Apache Cassandra / ScyllaDB](docs/assets/Introducing_kafka-streams-cassandra-state-store.webp)\n\nℹ️ [Kafka Streams](https://kafka.apache.org/documentation/streams/) is a Java client library for building stream-processing applications and microservices, where the input and output data are stored in Kafka clusters.   \nℹ️ [Apache Cassandra](https://cassandra.apache.org/) is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.\n\n### Blog post introducing the library + Demo YouTube\n\n* **Blog:** https://thriving.dev/blog/introducing-kafka-streams-cassandra-state-store\n* **Demo:** https://youtu.be/2Co9-8E-uJE\n\n\n## Stack\n\n### Implemented/compiled with\n* Java 17\n* kafka-streams 3.6\n* datastax java-driver-core 4.17.0\n\n### Supported client-libs\n* Kafka Streams 2.7.0+ (maybe even earlier versions, but wasn't tested further back)\n* Datastax java client (v4) `'com.datastax.oss:java-driver-core:4.17.0'`\n* ScyllaDB shard-aware datastax java client (v4) fork `'com.scylladb:java-driver-core:4.17.0.0'`\n\n### Supported databases\n* Apache Cassandra 3.11\n* Apache Cassandra 4.0\n* Apache Cassandra 4.1\n* ScyllaDB (tested from 4.3+)\n\n#### Integration Tests\n* JUnit 5, AssertJ\n* [testcontainers](https://www.testcontainers.org/)\n\n## Get it!\n\n**kafka-streams-cassandra-state-store** is available on [Maven Central](https://central.sonatype.com/artifact/dev.thriving.oss/kafka-streams-cassandra-state-store/):\n\n#### Maven\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003edev.thriving.oss\u003c/groupId\u003e\n    \u003cartifactId\u003ekafka-streams-cassandra-state-store\u003c/artifactId\u003e\n    \u003cversion\u003e${version}\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nClasses of this library are in the package `dev.thriving.oss.kafka.streams.cassandra.state.store`.\n\n#### Gradle (Groovy DSL)\n```groovy\nimplementation 'dev.thriving.oss:kafka-streams-cassandra-state-store:${version}'\n```\n\n### Datastax Java Client\n\nTo avoid library collisions, the cassandra java driver is non-transitive.    \nTherefore you have to choose and add a _datastax driver based_ java client dependency to your project.\n\n* Datastax java client (v4) `'com.datastax.oss:java-driver-core:4.17.0'` (works for Cassandra 3.11, 4.0, 4.11)\n* ScyllaDB shard-aware datastax java client (v4) fork `'com.scylladb:java-driver-core:4.17.0.0'`\n\n## Usage\n### Quick start\n\n#### ‼️**Important:** notes upfront\n\n1. Disable logging =\u003e `withLoggingDisabled()`    \n   ...enabled by default, kafka streams is 'logging' the events making up the store's state against a _changelog topic_ to be able to restore state following a rebalance or application restart. Since cassandra is a permanent external store, state does not need to be _restored_ but is always available.\n1. Disable caching =\u003e `withCachingDisabled()`    \n   ...enabled by default, kafka streams is buffering writes - which is not what we want when working with cassandra state store\n1. Do not use [standby replicas](https://docs.confluent.io/platform/current/streams/developer-guide/config-streams.html#streams-developer-guide-standby-replicas) =\u003e `num.standby.replicas=0`    \n   ...standby replicas are used to minimize the latency of task failover by keeping shadow copies of local state stores as a hot standby. The state store backed by cassandra does not need to be restored or re-balanced since all streams instances can directly access any partitions state.\n\n#### High-level DSL \u003c\u003e StoreSupplier\n\nWhen using the high-level DSL, i.e., `StreamsBuilder`, users create `StoreSupplier`s that can be further customized via `Materialized`.\n\nFor example, a topic read as `KTable` can be materialized into a cassandra k/v store with custom key/value serdes, with logging and caching disabled:\n\n```java\nStreamsBuilder builder = new StreamsBuilder();\nKTable\u003cLong,String\u003e table = builder.table(\n  \"topicName\",\n  Materialized.\u003cLong,String\u003eas(\n                 CassandraStores.builder(session, \"store-name\")\n                         .partitionedKeyValueStore()\n              )\n              .withKeySerde(Serdes.Long())\n              .withValueSerde(Serdes.String())\n              .withLoggingDisabled()\n              .withCachingDisabled());\n```\n\n#### Processor API \u003c\u003e StoreBuilder\n\nWhen using the Processor API, i.e., `Topology`, users create `StoreBuilder`s that can be attached to `Processor`s.\n\nFor example, you can create a cassandra stringKey value store with custom key/value serdes, logging and caching disabled like:\n\n```java\nStoreBuilder\u003cKeyValueStore\u003cString, Long\u003e\u003e sb = \n    Stores.keyValueStoreBuilder(\n        CassandraStores.builder(session, \"store-name\")\n                .partitionedKeyValueStore(),\n        Serdes.String(),\n        Serdes.Long())\n    .withLoggingDisabled()\n    .withCachingDisabled();\ntopology.addStateStore(sb);\n```\n\n### Examples\nExamples (incl. docker-compose setup) can be found in the [/examples](/examples) folder.\n\nInstructions on how to run and work with the example apps can be found at the individual example root folder's README file.\n\nTake a look at the notorious word-count example with Cassandra 4 -\u003e [/examples/word-count-cassandra4](/examples/word-count-cassandra4).\n\n#### Common Requirements for running the examples\n- Docker to run\n- [kcat](https://github.com/edenhill/kcat) for interacting with Kafka (consume/produce)\n\n### Store Types\nkafka-streams-cassandra-state-store comes with 4 different store types:\n- partitionedKeyValueStore\n- globalKeyValueStore\n- partitionedVersionedKeyValueStore\n- globalVersionedKeyValueStore\n\n#### partitionedKeyValueStore\nA persistent `KeyValueStore\u003cBytes, byte[]\u003e`.\nThe underlying cassandra table is **partitioned by** the store context **task partition**.\nTherefore, it behaves exactly like the regular state stores (RocksDB/InMemory/MemoryLRUCache).\nAll CRUD operations against this store always query by and return results for a single stream task.\n\n#### globalKeyValueStore\nA persistent `KeyValueStore\u003cBytes, byte[]\u003e`.\nThe underlying cassandra table uses the **record key as sole PRIMARY KEY**.\nTherefore, all CRUD operations against this store work from any streams task and therefore always are “global”.\nDue to the nature of cassandra tables having a single PK (no clustering key), this store supports only a limited number of operations.\n\nThis global store should not be confused with a Kafka Streams Global Store!\nIt has to be used as a non-global (regular!) streams KeyValue state store - though it allows to read entries from any streams context (streams task/thread).\n\n**Tip:** This store type can be useful when exposing state store access via an API. Each running instance of your app can serve all requests without the need to proxy the request to the right instance having the streams task assigned for the key in question.\n\n⚠️ For **querying** this **global CassandraKeyValueStore**, make sure to restrict the `WrappingStoreProvider` to a single (assigned) partition.\nThe KafkaStreams instance returns a `CompositeReadOnlyKeyValueStore` that holds the `WrappingStoreProvider`, wrapping all assigned tasks' stores. Without the correct `StoreQueryParameters` the same query is executed multiple times (for all assigned partitions) and combines multiple identical results.\n\n#### partitionedVersionedKeyValueStore\nA persistent `VersionedKeyValueStore\u003cBytes, byte[]\u003e`.\nThe underlying cassandra table is **partitioned by** the store context **task partition**.\nTherefore, it behaves exactly like the regular versioned state store (RocksDB).\nAll CRUD operations against this store always query by and return results for a single stream task.\n\n#### globalVersionedKeyValueStore\nA persistent `VersionedKeyValueStore\u003cBytes, byte[]\u003e`.\nThe underlying cassandra table uses the **record key + validTo as composite PRIMARY KEY** (validTo as the clustering key).\nTherefore, all CRUD operations against this store work from any streams task and therefore always are “global”.\n\n#### Interactive Queries\nThe `CassandraStateStore` interface provides static helper methods to get a correctly configured read-only store facade:\n\n💡Please read the blog post for more details: https://thriving.dev/blog/interactive-queries-with-kafka-streams-cassandra-state-store\n\n**globalKeyValueStore:**\n```java\n// get a read-only store to exec interactive queries ('global' type cassandra KeyValueStore)\nReadOnlyKeyValueStore\u003cString, Long\u003e store = CassandraStateStore.readOnlyGlobalKeyValueStore(streams, STORE_NAME);\n        \n// Get the value from the store\nLong value = store.get(key);\n```\nExample provided: [examples/global-store-restapi](examples/global-store-restapi)\n\n**partitionedKeyValueStore:**   \nGet an optimised special implementation of {@link ReadOnlyKeyValueStore} for 'local' type CassandraKeyValueStore.\nThe returned object can be used to query the state directly from the underlying Cassandra table.\nNo 'RPC layer' is required since queries for all/individual partitions are executed from this instance, and query\nresults are merged where necessary.\n```java\n// get a read-only store to exec interactive queries ('partitioned' type cassandra KeyValueStore)\nReadOnlyKeyValueStore\u003cString, Long\u003e store = CassandraStateStore.readOnlyPartitionedKeyValueStore(\n        streams,                                                // streams\n        \"word-count\",                                           // storeName\n        session,                                                // session\n        \"kstreams_wordcount\",                                   // keyspace\n        true,                                                   // isCountAllEnabled\n        \"dml\",                                                  // dmlExecutionProfile\n        stringSerde,                                            // keySerde\n        longSerde,                                              // valueSerde\n        CassandraStateStore.DEFAULT_TABLE_NAME_FN,              // tableNameFn\n        new DefaultStreamPartitioner\u003c\u003e(keySerde.serializer())   // partitioner\n);\n        \n// Get the value from the store\nLong value = store.get(key);\n```\n⚠️ The special implementation `CassandraPartitionedReadOnlyKeyValueStore` requires `application.server` config to be set (to be able to access metadata).\n\nExample provided: [examples/partitioned-store-restapi](examples/partitioned-store-restapi)\n\nMore examples can also be found in the [integration tests](kafka-streams-cassandra-state-store/src/intTest/java/dev/thriving/oss/kafka/streams/cassandra/state/store).\n\n**partitionedVersionedKeyValueStore/globalVersionedKeyValueStore:**   \nWith Kafka 3.5 interactive queries interfaces are not yet available for versioned key value stores. Plans exist to add this in the future.\nFollowing KIPs have been identified (_asOfTImestamp_ 2023-08-25): KIP-960, KIP-968, KIP-969.\n\n#### Supported operations by store type (KeyValueStore)\n\n|                         | partitionedKeyValueStore | globalKeyValueStore |\n|-------------------------|--------------------------|---------------------|\n| get                     | ✅                        | ✅                   |\n| put                     | ✅                        | ✅                   |\n| putIfAbsent             | ✅                        | ✅                   |\n| putAll                  | ✅                        | ✅                   |\n| delete                  | ✅                        | ✅                   |\n| range                   | ✅                        | ❌                   |\n| reverseRange            | ✅                        | ❌                   |\n| all                     | ✅                        | ✅                   |\n| reverseAll              | ✅                        | ❌                   |\n| prefixScan              | ✅                        | ❌                   |\n| approximateNumEntries   | ✅*                       | ✅*                  |\n| query::RangeQuery       | ✅                        | ❌                   |\n| query::KeyQuery         | ✅                        | ✅                   |\n| query::WindowKeyQuery   | ❌                        | ❌                   |\n| query::WindowRangeQuery | ❌                        | ❌                   |\n\n*opt-in required\n\n#### Supported operations by store type (VersionedKeyValueStore)\n\n|                            | partitionedVersionedKeyValueStore | globalVersionedKeyValueStore |\n|----------------------------|-----------------------------------|------------------------------|\n| get(key)                   | ✅                                 | ✅                            |\n| get(key, asOfTimestamp)    | ✅                                 | ✅                            |\n| put(key, value, timestamp) | ✅                                 | ✅                            |\n| delete(key, timestamp)     | ✅                                 | ✅                            |\n\n### Builder\nThe `CassandraStores` class provides a method `public static CassandraStores builder(final CqlSession session, final String name)` that returns an instance of _CassandraStores_ which ultimately is used to build an instance of `KeyValueBytesStoreSupplier` to add to your topology.\n\nBasic usage example:\n```java\nCassandraStores.builder(session, \"word-grouped-count\")\n        .withKeyspace(\"\")\n        .partitionedKeyValueStore()\n```\n\nAdvanced usage example:\n```java\nCassandraStores.builder(session, \"word-grouped-count\")\n        .withKeyspace(\"poc\")\n        .withCountAllEnabled()\n        .withTableOptions(\"\"\"\n                compaction = { 'class' : 'LeveledCompactionStrategy' }\n                AND default_time_to_live = 86400\n                \"\"\")\n        .withTableNameFn(storeName -\u003e\n            String.format(\"%s_kstreams_store\", storeName.toLowerCase().replaceAll(\"[^a-z0-9_]\", \"_\")))\n        .partitionedKeyValueStore()\n```\n\nPlease also see [Quick start](#quick-start) for full kafka-streams example.\n\n#### Builder options\n\n##### `withKeyspace(String keyspace)`\nThe keyspace for the state store to operate in. By default, the provided `CqlSession` _session-keyspace_ is used.\n\n##### `withTableOptions(String tableOptions)`\nA CQL table has a number of options that can be set at creation.\n\nPlease omit `WITH ` prefix.\nMultiple options can be added using `AND`, e.g. `\"table_option1 AND table_option2\"`.\n\nRecommended compaction strategy is 'LeveledCompactionStrategy' which is applied by default.   \n-\u003e Do not forget to add when overwriting table options.\n\nPlease refer to table options of your cassandra cluster.\n- [Cassandra 4](https://cassandra.apache.org/doc/latest/cassandra/cql/ddl.html#create-table-options)\n- [ScyllaDB](https://docs.scylladb.com/stable/cql/ddl.html#table-options)\n\nPlease note this config will only apply upon initial table creation. ('ALTER TABLE' is not yet supported).\n\nDefault: `\"compaction = { 'class' : 'LeveledCompactionStrategy' }\"`\n\n##### `withTableNameFn(Function\u003cString, String\u003e tableNameFn)`\nCustomize how the state store cassandra table is named, based on the kstreams store name.\n\n⚠️ Please note _changing_ the store name _for a pre-existing store_ will result in a **new empty table** to be created.\n\nDefault: `${normalisedStoreName}_kstreams_store` - normalise := lowercase, replaces all [^a-z0-9_] with '_'   \n  e.g. (\"TEXT3.word-count2\") -\u003e \"text3_word_count2_kstreams_store\"\n\n##### `withCountAllEnabled()`\nEnable (opt-in) the CassandraKeyValueStore to use `SELECT COUNT(*)` when [ReadOnlyKeyValueStore#approximateNumEntries()](https://kafka.apache.org/34/javadoc/org/apache/kafka/streams/state/ReadOnlyKeyValueStore.html#approximateNumEntries()) is invoked.\n\n⚠️ Cassandra/CQL does not support getting approximate counts. Exact row count using `SELECT COUNT(*)` requires significant CPU and I/O resources and may be quite slow depending on store size... use with care!\n\nDisabled by default.\n\n##### `withCreateTableDisabled()`\nDisable (opt-out) automatic table creation during store initialization.   \nEnabled by default.\n\n##### `withDdlExecutionProfile(String ddlExecutionProfile)`\nSet the execution profile to be used by the driver for all DDL (Data Definition Language) queries.\n\nℹ️ Note: Only applies if table creation ({@link CassandraStores#withCreateTableDisabled()}) is enabled (default).   \nIf no profile is set - DDL queries are executed with consistency `ALL`.   \nWhen using a custom profile, it is recommended to also set consistency=ALL   \n(Reason: avoid issues with concurrent schema updates)\n\nReference: https://docs.datastax.com/en/developer/java-driver/4.15/manual/core/configuration/#execution-profiles\n\nMust be a non-blank String.   \nSet to `null` to disable (basic applies).\n\nDefault: `null`\n\n##### `withDmlExecutionProfile(String dmlExecutionProfile)`\nSet the execution profile to be used by the driver for all DML (Data Manipulation Language) queries.\n\nReference: https://docs.datastax.com/en/developer/java-driver/4.15/manual/core/configuration/#execution-profiles\"\n\nMust be a non-blank String.   \nSet to `null` to disable (basic applies).\n\nDefault: `null`\n\n\n## Fine Print\n\n### Known Limitations\nAdding additional infrastructure for data persistence external to Kafka comes with certain risks and constraints.\n\n#### Consistency\nKafka Streams supports _at-least-once_ and _exactly-once_ processing guarantees. At-least-once semantics is enabled by default.\n\nKafka Streams _exactly-once_ processing guarantees is using Kafka transactions. These transactions wrap the entirety of processing a message throughout your streams topology, including messages published to outbound topic(s), changelog topic(s), and consumer offsets topic(s).\n\nThis is possible through transactional interaction with a single distributed system (Apache Kafka). Bringing an external system (Cassandra) into play breaks this pattern. Once data is written to the database it can't be rolled back in the event of a subsequent error / failure to complete the current message processing.\n\n⚠️ =\u003e If you need strong consistency, have _exactly-once_ processing enabled (streams config: `processing.guarantee=\"exactly_once_v2\"`), and/or your processing logic is not fully idempotent then using **kafka-streams-cassandra-state-store** is discouraged! ⚠️\n\nℹ️ Please note this is also true when using kafka-streams with the native state stores (RocksDB/InMemory) with *at-least-once* processing.guarantee (default).\n\nFor more information on Kafka Streams processing guarantees, check the sources referenced below.\n\n##### References\n- https://medium.com/lydtech-consulting/kafka-streams-transactions-exactly-once-messaging-82194b50900a\n- https://docs.confluent.io/platform/current/streams/developer-guide/config-streams.html#processing-guarantee\n- https://docs.confluent.io/platform/current/streams/concepts.html#processing-guarantees\n\n#### Incomplete Implementation of Interfaces `StateStore` \u0026 `ReadOnlyKeyValueStore`\n\nNot all methods have been implemented. Please check [store types method support table](#store-types) above for more details.\n\n\n### Cassandra Specifics\n\n#### Underlying CQL Schema\n\n##### partitionedKeyValueStore\nUsing defaults, for a state store named \"word-count\" following CQL Schema applies:\n```sql\nCREATE TABLE IF NOT EXISTS word_count_kstreams_store (\n    partition INT,\n    key BLOB,\n    value BLOB,\n    time TIMESTAMP,\n    PRIMARY KEY ((partition), key)\n) WITH compaction = { 'class' : 'LeveledCompactionStrategy' }\n```\n\n##### globalKeyValueStore\nUsing defaults, for a state store named \"clicks-global\" following CQL Schema applies:\n```sql\nCREATE TABLE IF NOT EXISTS clicks_global_kstreams_store (\n    key BLOB,\n    value BLOB,\n    time TIMESTAMP,\n    PRIMARY KEY (key)\n) WITH compaction = { 'class' : 'LeveledCompactionStrategy' }\n```\n\n##### partitionedVersionedKeyValueStore\nUsing defaults, for a state store named \"word-count\" following CQL Schema applies:\n```sql\nCREATE TABLE IF NOT EXISTS word_count_kstreams_store (\n    partition INT,\n    key BLOB,\n    validFrom TIMESTAMP,\n    validTo TIMESTAMP,\n    value BLOB,\n    time TIMESTAMP,\n    PRIMARY KEY ((partition), key, validTo)\n) WITH compaction = { 'class' : 'LeveledCompactionStrategy' }\n```\n\n##### globalVersionedKeyValueStore\nUsing defaults, for a state store named \"clicks-global\" following CQL Schema applies:\n```sql\nCREATE TABLE IF NOT EXISTS clicks_global_kstreams_store (\n    key BLOB,\n    validFrom TIMESTAMP,\n    validTo TIMESTAMP,\n    value BLOB,\n    time TIMESTAMP,\n    PRIMARY KEY ((key), validTo)\n) WITH compaction = { 'class' : 'LeveledCompactionStrategy' }\n```\n\n#### Feat: Cassandra table with default TTL\n\n💡 **Tip:** Cassandra has a table option `default_time_to_live` (default expiration time (“TTL”) in seconds for a table) which can be useful for certain use cases where data (state) can or should expire.\n\nPlease note writes to cassandra are made with system time. The table TTL will therefore apply based on the time of write (not stream time).\n\n#### Cassandra table partitioning (avoiding large partitions)\n\nKafka is persisting data in segments and is built for sequential r/w. As long as there's sufficient disk storage space available to brokers, a high number of messages for a single topic partition is not a problem.\n\nApache Cassandra on the other hand can get inefficient (up to severe failures such as load shedding, dropped messages, and to crashed and downed nodes) when partition size grows too large.\nThe reason is that searching becomes too slow as search within partition is slow. Also, it puts a lot of pressure on (JVM) heap.\n\n⚠️ The community has offered a standard recommendation for Cassandra users to keep Partitions under 400MB, and preferably under 100MB.\n\nFor the current implementation, the cassandra table created for the 'default' key-value store is partitioned by the kafka _partition key_ (\"wide partition pattern\").\nPlease keep these issues in mind when working with relevant data volumes.    \nIn case you don't need to query your store / only lookup by key ('range', 'prefixScan'; ref [Supported operations by store type](#supported-operations-by-store-type-keyvaluestore)) it's recommended to use `globalKeyValueStore` rather than `keyValueStore` since it is partitioned by the _event key_ (:= primary key).\n\nℹ️ References:\n- blog post on [Wide Partitions in Apache Cassandra 3.11](https://thelastpickle.com/blog/2019/01/11/wide-partitions-cassandra-3-11.html)    \n  Note: in case anyone has funded knowledge if/how this has changed with Cassandra 4, please share!\n- [stackoverflow question](https://stackoverflow.com/questions/68237371/wide-partition-pattern-in-cassandra)\n\n\n## Development\n\n### Requirements\n\n- Java 17\n- Docker (integration tests with testcontainers)\n\n### Build\n\nThis library is bundled with Gradle. Please note The build task also depends on task testInt which runs integration tests using testcontainers (build \u003c- check \u003c- intTest).\n\n```shell\n./gradlew clean build\n```\n\n### Integration test\n\nIntegration tests can be run separately via\n\n```shell\n./gradlew :kafka-streams-cassandra-state-store:intTest\n```\n\n\n## Roadmap\n\n- [x] MVP\n  - [x] CQL Schema\n  - [x] implementation\n- [x] restructure code\n  - [x] split implementation \u0026 examples\n  - [x] Abstract store, introduce Repo, KeySerdes (Byte \u003c\u003e ByteBuffer|String)\n  - [x] CassandraStores Builder, configurable\n    - [x] table name fn\n    - [x] keyspace\n    - [x] ~~table default ttl~~\n    - [x] ~~compaction strategy~~\n    - [x] ~~compression~~\n    - [x] fully customizable table options (support both Cassandra \u0026 ScyllaDB)\n- [x] examples\n  - [x] WordCount Cassandra 4\n  - [x] WordCount Cassandra 3 (v4 client lib)\n  - [x] WordCount ScyllaDB\n  - [x] WordCount Processor + all + range + prefixScan + approximateNumEntries\n  - [x] GlobalCassandraStore + KStream enrichment\n  - [x] Quarkus examples app as GraalVM native image (https://github.com/thriving-dev/kafka-streams-cassandra-state-store/issues/7)\n- [x] additional features\n  - [x] ~~Prefix scan with `stringKeyValueStore` (ScyllaDB only)~~ (removed with v0.3)\n  - [ ] ~~Prefix scan with `stringKeyValueStore` (Cassandra with SASIIndex? https://stackoverflow.com/questions/49247092/order-by-and-like-in-same-cassandra-query/49268543#49268543)~~\n  - [x] `ReadOnlyKeyValueStore.prefixScan` implementation using range (see InMemoryKeyValueStore implementation)\n  - [x] Implement `globalKeyValueStore`\n  - [x] Support KIP-889: Versioned State Stores (to be delivered with kafka 3.5.0) (https://github.com/thriving-dev/kafka-streams-cassandra-state-store/issues/21)\n- [x] OpenSource\n  - [x] choose + add license\n  - [x] add CHANGELOG.md\n  - [x] add CODE_OF_CONDUCT.md\n  - [x] add CONTRIBUTING.md\n  - [x] polishing\n  - [x] make repo public\n  - [x] Publish to maven central (?) https://h4pehl.medium.com/publish-your-gradle-artifacts-to-maven-central-f74a0af085b1\n    - [x] request namespace ownership\n    - [x] add JavaDocs\n    - [x] other -\u003e maven central compliant https://central.sonatype.org/publish/requirements/\n    - [x] gradle plugin to publish to maven central https://julien.ponge.org/blog/publishing-from-gradle-to-maven-central-with-github-actions/\n    - [x] publish snapshot version 0.1.0-SNAPSHOT\n    - [x] add gradle release plugin\n    - [x] tag + publish initial version 0.1.0\n- [ ] Ops\n  - [x] github actions to build (+test)\n  - [ ] ? add renovate\n    - (vs. depandabot?)\n      - https://github.com/renovatebot/github-action\n      - https://docs.renovatebot.com/java/\n  - [x] github actions to publish to maven central (snapshot, releases) https://julien.ponge.org/blog/publishing-from-gradle-to-maven-central-with-github-actions/\n  - [x] github actions for triggering 'gradle release' from repo with automatic semver\n- [x] Write Documentation\n  - [x] summary\n  - [x] compatibility cassandra 3.11, 4.x, ScyllaDB\n  - [x] cleanup README\n  - [x] install\n  - [x] quick start\n  - [x] link to examples\n  - [x] overview store types\n  - [x] usage, builder, config options\n  - [x] limitations\n  - [x] Cassandra Specifics\n    - [x] Underlying CQL Schema\n    - [x] Feat: Cassandra table with default TTL\n    - [ ] r/w consistency\n    - [ ] profiles for DDL/DML\n    - [ ] retry-policy https://docs.datastax.com/en/developer/java-driver/4.15/manual/core/retries/\n    - [ ] ? request throttling, e.g. rate-based to avoid overloading the db? https://docs.datastax.com/en/developer/java-driver/4.15/manual/core/throttling/\n  - [ ] (Caching options)\n  - [ ] Exception/error handling\n- [x] Security\n  - [x] test against 'CQL injection' via `withTableOptions(..)` \n        =\u003e tried to add `compaction = { 'class' : 'LeveledCompactionStrategy' };DROP TABLE xyz` which fails due to wrong syntax in Cassandra 3.11/4.1 \u0026 ScyllaDB 5.1  \n- [ ] bugs\n  - [x] cassandra concurrent schema updates (concurrent table auto-creation) lead to schema collisions (tables are created by each task-thread in parallel on first application start) (https://github.com/thriving-dev/kafka-streams-cassandra-state-store/issues/12)\n  - [ ] exception handling (https://github.com/thriving-dev/kafka-streams-cassandra-state-store/issues/13)\n- [ ] tests\n  - [ ] unit tests (?)\n  - [x] integration test using testcontainers\n    - [x] WordCountTest\n    - [x] WordCountInteractiveQueriesTest\n    - [x] WordCountGlobalStoreTest\n- [ ] Other\n  - [ ] migrate to gradle version catalogs https://docs.gradle.org/current/userguide/platforms.html, https://developer.android.com/build/migrate-to-catalogs\n- [ ] Advanced/Features/POCs Planned/Considered\n  - [x] correctness / completeness (https://github.com/thriving-dev/kafka-streams-cassandra-state-store/issues/14)\n    - [ ] ~wrap stores with MeteredKeyValueStore ?~ -\u003e done automatically via builders\n    - [ ] ~provide `timestampedKeyValueStore`~ -\u003e no use case\n    - [ ] ~? (TBC) logging / caching is always disabled (because it's not implemented to wrap store by CassandraStores...)~\n      - [ ] ~always disable logging + caching?~\n  - [ ] add additional store types\n    - [ ] WindowedStore functionality, example, ...\n    - [ ] ...?\n  - [x] Add builder config options\n    - [x] opt-out to avoid tables to be auto-created (https://github.com/thriving-dev/kafka-streams-cassandra-state-store/issues/9)\n    - [x] allow setting execution profiles to be used for queries, separate for DDL|DML (https://github.com/thriving-dev/kafka-streams-cassandra-state-store/issues/11)\n    - [x] opt-in to enable count using `SELECT COUNT(*)` for `approximateNumEntries` (https://github.com/thriving-dev/kafka-streams-cassandra-state-store/issues/10)\n  - [ ] (?) simple inMemory read cache -\u003e Caffeine? (separate lib?) (https://github.com/thriving-dev/kafka-streams-cassandra-state-store/issues/18)\n  - [ ] Benchmark\n  - [ ] Explore buffered writes ('caching') -\u003e parallel writes to Cassandra to boost performance?\n  - [ ] add Metrics?\n    - [ ] (?) Metrics also for Caches?\n  - [ ] move (parts of) documentation to separate pages/wiki?\n  - [ ] explore using indexes (e.g. global secondary indexes) for partitioned kv store\n  - [ ] Custom ReadOnlyKeyValueStore for 'partitionedKeyValueStore' type optimised interactive queries\n  - [ ] Follow-up tasks on 'Versioned State Stores'\n    - [ ] Add interactive queries support once follow-up KIPs are delivered\n    - [ ] Benchmark\n    - [ ] Consider (in-memory) caching options to improve performance (ref [#18](https://github.com/thriving-dev/kafka-streams-cassandra-state-store/issues/18)) \n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthriving-dev%2Fkafka-streams-cassandra-state-store","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthriving-dev%2Fkafka-streams-cassandra-state-store","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthriving-dev%2Fkafka-streams-cassandra-state-store/lists"}