Examples of Avro, Kafka, Schema Registry, Kafka Streams, Interactive Queries, KSQL, Kafka Connect in Scala

avro interactive-queries kafka kafka-connect kafka-scala kafka-streams ksql schema-registry

# kafka-scala-examples

Examples in Scala of
* [Avro](#avro)
* [Kafka](#kafka)
* [Schema Registry](#schema-registry)
* [Kafka Streams](#kafka-streams)
- with [cats](
- with [ZIO](, see also [zio-kafka-streams](
* Interactive Queries *TODO*
- with REST/http4s
- with GraphQl/Caliban
* [KSQL](#ksql)
* [Kafka Connect](#kafka-connect)
* [Extra](#extra)

Local environment

# start locally
# - zookeeper
# - kafka
# - kafka-rest
# - kafka-ui
# - schema-registry
# - schema-registry-ui
# - ksql-server
# - ksql-cli
# - kafka-connect
# - kafka-connect-ui
docker-compose up

# (mac|linux) view kafka ui
[open|xdg-open] http://localhost:8000

# (mac|linux) view schema-registry ui
[open|xdg-open] http://localhost:8001

# (mac|linux) view kafka-connect ui
[open|xdg-open] http://localhost:8002

# cleanup
docker-compose down -v

If containers are crashing, make sure you have enough resources

# verify memory and cpu usage
docker ps -q | xargs docker stats --no-stream

# verify status
docker inspect | jq '.[].State'

## avro


[Avro]( serialization and deserialization examples of
* `SpecificRecord` code generation with [sbt-avro]( [[source](avro/src/main/scala/com/kafka/demo/original/AvroCodeGeneration.scala)|[test](avro/src/test/scala/com/kafka/demo/original/AvroCodeGenerationSpec.scala)]
* `GenericRecord` [[source](avro/src/main/scala/com/kafka/demo/original/AvroGenericRecord.scala)|[test](avro/src/test/scala/com/kafka/demo/original/AvroGenericRecordSpec.scala)]
* [avro4s]( [[source](avro/src/main/scala/com/kafka/demo/avro4s/Avro4sExample.scala)|[test](avro/src/test/scala/com/kafka/demo/avro4s/Avro4sExampleSpec.scala)]
* Java/Scala libraries compatibility [[test](avro/src/test/scala/com/kafka/demo/LibraryCompatibilitySpec.scala)]


# console
sbt avro/console

# generate avro classes
# avro/target/scala-2.12/classes/com/kafka/demo/User.class
sbt clean avro/compile

# test
sbt clean avro/test

### Resources

* [Data Serialization and Evolution](
* [Three Reasons Why Apache Avro Data Serialization is a Good Choice](
* [Schema evolution in Avro, Protocol Buffers and Thrift](
* [Avro Introduction for Big Data and Data Streaming Architectures](
* [Stream Avro Records into Kafka using Avro4s and Akka Streams Kafka](
* [Kafka, Spark and Avro](

## kafka


[Kafka]( apis example of
* `KafkaProducer` [[doc](|[source](kafka/src/main/scala/com/kafka/demo/original/Producer.scala)]
and `KafkaConsumer` [[doc](|[source](kafka/src/main/scala/com/kafka/demo/original/Consumer.scala)]
* [CakeSolutions](
`KafkaProducer` [[source](kafka/src/main/scala/com/kafka/demo/cakesolutions/Producer.scala)|[test](kafka/src/test/scala/com/kafka/demo/cakesolutions/KafkaSpec.scala)]
and `KafkaConsumer` [[source](kafka/src/main/scala/com/kafka/demo/cakesolutions/Consumer.scala)|[test](kafka/src/test/scala/com/kafka/demo/cakesolutions/KafkaSpec.scala)]


# access kafka
docker exec -it local-kafka bash

# create topic
# convention ..
# example [|]
kafka-topics --zookeeper zookeeper:2181 \
--create --if-not-exists --replication-factor 1 --partitions 1 --topic

# delete topic
kafka-topics --zookeeper zookeeper:2181 \
--delete --topic

# view topic
kafka-topics --zookeeper zookeeper:2181 --list
kafka-topics --zookeeper zookeeper:2181 --describe --topic

# view topic offset
kafka-run-class \
--broker-list kafka:9092 \
--time -1 \

# list consumer groups
kafka-consumer-groups --bootstrap-server kafka:9092 --list

# view consumer group offset
kafka-consumer-groups \
--bootstrap-server kafka:9092 \
--group \

# reset consumer group offset
kafka-consumer-groups \
--bootstrap-server kafka:9092 \
--group \
--topic \
--reset-offsets \
--to-earliest \

# console producer
kafka-console-producer --broker-list kafka:9092 --topic
kafkacat -P -b 0 -t

# console consumer
kafka-console-consumer --bootstrap-server kafka:9092 --topic --from-beginning
kafkacat -C -b 0 -t

# producer example
sbt "kafka/runMain com.kafka.demo.original.Producer"
sbt "kafka/runMain com.kafka.demo.cakesolutions.Producer"

# consumer example
sbt "kafka/runMain com.kafka.demo.original.Consumer"
sbt "kafka/runMain com.kafka.demo.cakesolutions.Consumer"

# test
sbt clean kafka/test
sbt "test:testOnly *KafkaSpec"

### Resources

* DevOps [Kafka](
* [Kafka topic naming conventions](
* [Should you put several event types in the same Kafka topic?](
* [How to choose the number of topics/partitions in a Kafka cluster?](
* [Kafka Partitioning](
* [The Log: What every software engineer should know about real-time data's unifying abstraction](
* [Apache Kafka: 8 things to check before going live](
* [Apache Kafka vs Apache Pulsar](

## schema-registry


* Confluent's Schema Registry [API](
and [examples](
* Console [examples](

# register schema
# convention -key or -value
http -v POST :8081/subjects/example.with-schema.simple-value/versions \
Accept:application/vnd.schemaregistry.v1+json \

# import schema from file
http -v POST :8081/subjects/example.with-schema.user-value/versions \
Accept:application/vnd.schemaregistry.v1+json \

# export schema to file
http :8081/subjects/example.with-schema.user-value/versions/latest \
| jq -r '.schema|fromjson' \
| tee avro/src/main/avro/user-latest.avsc

# list subjects
http -v :8081/subjects

# list subject's versions
http -v :8081/subjects/example.with-schema.simple-value/versions

# fetch by version
http -v :8081/subjects/example.with-schema.simple-value/versions/1

# fetch by id
http -v :8081/schemas/ids/1

# test compatibility
http -v POST :8081/compatibility/subjects/example.with-schema.simple-value/versions/latest \
Accept:application/vnd.schemaregistry.v1+json \

# delete version
http -v DELETE :8081/subjects/example.with-schema.simple-value/versions/1

# delete latest version
http -v DELETE :8081/subjects/example.with-schema.simple-value/versions/latest

# delete subject
http -v DELETE :8081/subjects/example.with-schema.simple-value

# stringify
jq tostring avro/src/main/avro/user.avsc


* [`BaseKafkaSchemaRegistrySpec`](schema-registry/src/test/scala/com/kafka/demo/BaseKafkaSchemaRegistrySpec.scala) to test Kafka with SchemaRegistry

* `SpecificRecord` with [sbt-avrohugger](

# generate SpecificRecord classes under "schema-registry/target/scala-2.12/src_managed/main/compiled_avro"
sbt clean schema-registry/avroScalaGenerateSpecific

# (optional) create schema
http -v POST :8081/subjects/example.with-schema.payment-key/versions \
Accept:application/vnd.schemaregistry.v1+json \
http -v POST :8081/subjects/example.with-schema.payment-value/versions \
Accept:application/vnd.schemaregistry.v1+json \

# access kafka
docker exec -it local-kafka bash

# (optional) create topic
kafka-topics --zookeeper zookeeper:2181 \
--create --if-not-exists --replication-factor 1 --partitions 1 --topic example.with-schema.payment

# console producer (binary)
kafka-console-producer --broker-list kafka:9092 --topic example.with-schema.payment

# console consumer (binary)
kafka-console-consumer --bootstrap-server kafka:9092 --topic example.with-schema.payment

# access schema-registry
docker exec -it local-schema-registry bash

# avro console producer
# example "MyKey",{"id":"MyId","amount":10}
kafka-avro-console-producer --broker-list kafka:29092 \
--topic example.with-schema.payment \
--property schema.registry.url=http://schema-registry:8081 \
--property parse.key=true \
--property key.separator=, \
--property key.schema='{"type":"string"}' \
--property value.schema='{"namespace":"io.confluent.examples.clients.basicavro","type":"record","name":"Payment","fields":[{"name":"id","type":"string"},{"name":"amount","type":"double"}]}'

# avro console consumer
kafka-avro-console-consumer --bootstrap-server kafka:29092 \
--topic example.with-schema.payment \
--property schema.registry.url=http://schema-registry:8081 \
--property \
--property print.key=true \
--property print.schema.ids=true \
--property key.separator=, \

# producer example
sbt "schema-registry/runMain com.kafka.demo.specific.Producer"

# consumer example
sbt "schema-registry/runMain com.kafka.demo.specific.Consumer"

# tests
sbt "schema-registry/test:testOnly *KafkaSchemaRegistrySpecificSpec"

* `GenericRecord` with [CakeSolutions](
[[Producer](schema-registry/src/main/scala/com/kafka/demo/generic/Producer.scala)|[Consumer](schema-registry/src/main/scala/com/kafka/demo/generic/Consumer.scala)] and schema evolution [test](schema-registry/src/test/scala/com/kafka/demo/KafkaSchemaRegistryGenericSpec.scala)

# producer example
sbt "schema-registry/runMain com.kafka.demo.generic.Producer"

# consumer example
sbt "schema-registry/runMain com.kafka.demo.generic.Consumer"

# tests
sbt "schema-registry/test:testOnly *KafkaSchemaRegistryGenericSpec"

### Resources

* [Serializing data efficiently with Apache Avro and dealing with a Schema Registry](
* [Kafka, Avro Serialization and the Schema Registry](
* [Kafka, Streams and Avro serialization](
* [Avro and the Schema Registry](
* [Producing and Consuming Avro Messages over Kafka in Scala](


* [schema-repo](
* Hortonworks [Registry](


* generic + schema evolution
* ovotech
* multi-schema
* formulation

## kafka-streams


[Kafka Streams]( apis examples


* `ToUpperCaseApp` [[source](streams/src/main/scala/com/kafka/demo/streams/ToUpperCaseApp.scala)|[test](streams/src/test/scala/com/kafka/demo/streams/ToUpperCaseSpec.scala)]

# access kafka
docker exec -it local-kafka bash

# create topic
# example [|]
kafka-topics --zookeeper zookeeper:2181 \
--create --if-not-exists --replication-factor 1 --partitions 1 --topic

# ToUpperCaseApp example (input topic required)
sbt "streams/runMain com.kafka.demo.streams.ToUpperCaseApp"

# produce
kafka-console-producer --broker-list kafka:9092 \

# consume
kafka-console-consumer --bootstrap-server kafka:9092 \

# test
sbt clean streams/test


Tested with [embedded-kafka]( and [embedded-kafka-schema-registry](

* `JsonToAvroApp` [[source](|[test](]

# access kafka
docker exec -it local-kafka bash

# create topic
# example [json.streams-json-to-avro-app.input|avro.streams-json-to-avro-app.output]
kafka-topics --zookeeper zookeeper:2181 \
--create --if-not-exists --replication-factor 1 --partitions 1 --topic

# produce (default StringSerializer)
kafka-console-producer \
--broker-list kafka:9092 \
--property "parse.key=true" \
--property "key.separator=:" \
--property "key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer" \
--property "value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer" \

# consume (default StringDeserializer)
kafka-console-consumer \
--bootstrap-server kafka:9092 \
--from-beginning \
--property "print.key=true" \
--property "key.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer" \
--property "value.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer" \

# access schema-registry
docker exec -it local-schema-registry bash

# consume avro
kafka-avro-console-consumer --bootstrap-server kafka:29092 \
--property schema.registry.url=http://schema-registry:8081 \
--property \
--property print.key=true \
--property print.schema.ids=true \
--property key.separator=, \
--from-beginning \

# JsonToAvroApp example (input topic required)
sbt "streams-json-avro/runMain com.kafka.demo.JsonToAvroApp"

# test
sbt clean streams-json-avro/test

# json

# log
[json.streams-json-to-avro-app.input]: mykey, JsonModel(42,foo)
[avro.streams-json-to-avro-app.output]: KeyAvroModel(mykey), ValueAvroModel(42,FOO)

#### Demo-3


* `CatsKafkaStreamsApp` [[source](]

# run app
sbt -jvm-debug 5005 "cats-kafka-streams/runMain com.kafka.demo.CatsKafkaStreamsApp"

#### Demo-4


* `ZioKafkaStreamsApp` [[source](]

# run app
sbt -jvm-debug 5005 "zio-kafka-streams/runMain com.kafka.demo.ZioKafkaStreamsApp"

### Resources

* [Introducing Kafka Streams: Stream Processing Made Simple](
* [Unifying Stream Processing and Interactive Queries in Apache Kafka](
* [Of Streams and Tables in Kafka and Stream Processing](
* [How to use Apache Kafka to transform a batch pipeline into a real-time one](
* [Functional Programming with Kafka Streams and Scala](
* [Enabling Exactly-Once in Kafka Streams](

## ksql


* [KSQL](
* [ksqlDB](
* [Udemy Course](

Setup Kafka
# access kafka
docker exec -it local-kafka bash

# create topic
kafka-topics --zookeeper zookeeper:2181 \
--create --if-not-exists --replication-factor 1 --partitions 1 --topic USER_PROFILE

# produce sample data
kafka-console-producer --broker-list kafka:9092 --topic USER_PROFILE << EOF
{"userid": 1000, "firstname": "Alison", "lastname": "Smith", "countrycode": "GB", "rating": 4.7}

# consume
kafka-console-consumer --bootstrap-server kafka:9092 --topic USER_PROFILE --from-beginning


* using the server
# access ksql-server
docker exec -it local-ksql-server bash

# start ksql cli
ksql http://ksql-server:8088

* using a local instance
# connect to local cli
docker exec -it local-ksql-cli ksql http://ksql-server:8088

* using a temporary instance
# connect to remote server
docker run --rm \
--network=kafka-scala-examples_local_kafka_network \
-it confluentinc/cp-ksql-cli http://ksql-server:8088

Execute SQL statements
# create stream
CREATE STREAM user_profile (\
userid INT, \
firstname VARCHAR, \
lastname VARCHAR, \
countrycode VARCHAR, \
rating DOUBLE \

# verify stream
list streams;
describe user_profile;

# query stream
SELECT userid, firstname, lastname, countrycode, rating FROM user_profile EMIT CHANGES;

Expect the consumer and the query to show the generated data
# generate data
docker run --rm \
-v $(pwd)/local/ksql:/datagen \
--network=kafka-scala-examples_local_kafka_network \
-it confluentinc/ksql-examples ksql-datagen \
bootstrap-server=kafka:29092 \
schemaRegistryUrl=http://schema-registry:8081 \
schema=datagen/user_profile.avro \
format=json \
key=userid \
maxInterval=5000 \

## kafka-connect

* [Kafka Connect](
* Confluent's Kafka Connect [API]( and [connectors](
* [Udemy Course](
* [Kafka Connect Fundamentals](

Setup PostgreSQL locally
# create shared network
docker-compose up

# start postgres
docker-compose -f docker-compose.postgres.yml up

# (mac|linux) view postgres ui
# [schema=public|database=postgres|username=postgres|password=postgres]
[open|xdg-open] http://localhost:8080

Setup connectors

* `kafka-connect-spooldir` [[confluent]( | [official](]
* `kafka-connect-jdbc` [[confluent](]

# list connector
http -v :8083/connectors

# init data to generate schema
cp local/connect/data/resources-0.txt.orig local/connect/data/resources-0.txt

# setup spooldir source connector
http -v --json POST :8083/connectors < local/connect/config/source-spooldir-connector.json

# ingest data
echo "{\"accountId\":\"123\",\"resourceType\":\"XXX\",\"value\":\"X1\"}" > local/connect/data/resources-1.txt

# setup jdbc sink connector
# topic = SCHEMA.DATABASE = "public.postgres"
http -v --json POST :8083/connectors < local/connect/config/sink-jdbc-connector.json

# verify data
docker exec -it local-postgres bash -c "psql -U postgres postgres"
select * from public.postgres;

# cleanup
docker-compose -f docker-compose.postgres.yml down -v

## extra

### Resources

* *Old [presentation](*
* [What is the actual role of Zookeeper in Kafka?](
* [How to use Apache Kafka to transform a batch pipeline into a real-time one](
* [Kafka Partitioning](
* [Should you put several event types in the same Kafka topic?](
* [How to choose the number of topics/partitions in a Kafka cluster?](
* [Docker Tips and Tricks with KSQL and Kafka](
* [Introduction to Topic Log Compaction in Apache Kafka](

### Tools

* [Kafka Streams Topology Visualizer]( (online)
* [kafkacat](
* [Kafka-Utils](
* [Insulator]( (GUI)
* [KLoadGen](
* [Kowl](
* [UI for Apache Kafka](
* [Cruise Control](
* [CMAK](

### Companies

* [Confluent](
* [Lenses](
* [Conduktor](