https://github.com/renardeinside/dbx-kafka-protobuf-example
Sample code for working with Kafka & Protobuf in Databricks
https://github.com/renardeinside/dbx-kafka-protobuf-example
databricks kafka protobuf scala spark spark-streaming
Last synced: 2 months ago
JSON representation
Sample code for working with Kafka & Protobuf in Databricks
- Host: GitHub
- URL: https://github.com/renardeinside/dbx-kafka-protobuf-example
- Owner: renardeinside
- Created: 2021-12-05T09:54:48.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-12-06T12:06:30.000Z (over 4 years ago)
- Last Synced: 2025-06-16T21:46:05.213Z (about 1 year ago)
- Topics: databricks, kafka, protobuf, scala, spark, spark-streaming
- Language: Scala
- Homepage:
- Size: 20.5 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
Awesome Lists containing this project
README
Example Spark Streaming Job with Amazon MSK & Protobuf on Databricks
===============================================================
This repository contains an example job which emulates writing/reading events to Kafka with Protobuf SerDe using Spark Streaming.
.. contents:: :local:
Quickstart
----------
* Clone the repository (or open it in Intellij IDEA)
* Generate Protobuf specs via:
.. code-block:: bash
sbt clean compile
* In Intellij IDEA mark the :code:`target/scala-2.12/src_managed/main` as generated sources root. Important: un-mark nested main/scalapb as generated sources root, otherwise you'll run into issues while compiling the project with Intellij.
* Configure Python environment and Databricks CLI
* Install and configure :code:`dbx`:
.. code-block:: bash
pip install dbx
dbx configure --profile-name=
* Provide required properties in the :code:`.env` file:
.. code-block:: bash
INSTANCE_PROFILE_NAME="your-instance-profile" # instance profile to access the MSK instance
DATABRICKS_CONFIG_PROFILE="your-databricks-cli-profile-name"
KAFKA_BOOTSTRAP_SERVERS_TO_SECRETS="" # Kafka Bootstrap Servers string
* Create the secret scope:
.. code-block::
make create-scope
* Add the secrets:
.. code-block::
make add-secrets
* Create a new instance pool in your databricks environment with name :code:`dbx-pool`.
* To deploy and launch the job in dev mode (the job won't be created or updated, ephemeral job run will be used):
.. code-block::
make dev-launch-generator
make dev-launch-processor
* To deploy the jobs so they'll be reflected in the Jobs UI:
.. code-block::
make jobs-deploy
Local tests
-----------
Local testing suite requires :code:`sbt` and :code:`Docker`, since we're using :code:`testcontainers` to run Kafka environment for unit tests.
Please find test example in :code:`src/test/scala/net/renarde/dbx/demos/app/UnifiedAppTest.scala`.