https://github.com/sky-uk/kafka-message-scheduler
Scheduler for low-frequency and long-term scheduling of delayed messages to Kafka topics.
https://github.com/sky-uk/kafka-message-scheduler
kafka scheduler
Last synced: 10 months ago
JSON representation
Scheduler for low-frequency and long-term scheduling of delayed messages to Kafka topics.
- Host: GitHub
- URL: https://github.com/sky-uk/kafka-message-scheduler
- Owner: sky-uk
- License: bsd-3-clause
- Created: 2017-07-21T13:28:19.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2024-07-18T15:12:45.000Z (almost 2 years ago)
- Last Synced: 2024-07-18T18:08:44.693Z (almost 2 years ago)
- Topics: kafka, scheduler
- Language: Scala
- Homepage:
- Size: 919 KB
- Stars: 32
- Watchers: 24
- Forks: 6
- Open Issues: 34
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Kafka Message Scheduler
[](https://app.travis-ci.com/sky-uk/kafka-message-scheduler)
[](https://hub.docker.com/r/skyuk/kafka-message-scheduler)
[](https://hub.docker.com/r/skyuk/kafka-message-scheduler)
This application is a scheduler for low-frequency and long-term scheduling of delayed messages to [Kafka](https://kafka.apache.org/) topics.
## Background
This component was initially designed for Sky's metadata ingestion pipeline. We wanted to manage content expiry (for scheduled airings or on-demand assets) in one single component, instead of implementing the expiry logic on all consumers.
Given that the pipeline is based on Kafka, it felt natural to use it as input, output and data store.
## How it works
The Kafka Message Scheduler (KMS for short) consumes messages from configured source (schedule) topics. On this topic:
- message keys are "Schedule IDs" - string values, with an expectation of uniqueness
- message values are Schedule messages, encoded in Avro binary format according to the [Schema](#schema).
A schedule is composed of:
- The topic you want to send the delayed message to
- The timestamp telling when you want that message to be delivered
- The actual message to be sent, both key and value
The KMS is responsible for sending the actual message to the specified topic at the specified time.
>**Note**
>
> If the timestamp of when to deliver the message is in the past, the schedule will be sent immediately.
The Schedule ID can be used to delete a scheduled message, via a delete message (with a null message value)
in the source topic.
### Startup logic
When the KMS starts up it uses the [kafka-topic-loader](https://github.com/sky-uk/kafka-topic-loader) to consume all messages from the configured `schedule-topics` and populate the scheduling actors state. Once this has completed, all of the schedules loaded are scheduled and the application will start normal processing. This means that schedules that have been fired and tombstoned, but not compacted yet, will not be replayed during startup.
## Schema
To generate the avro schema from the Schedule case class, run `sbt schema`. The schema will be written to
`avro/target/schemas/schedule.avsc`.
## How to run it
### Start services
```bash
docker-compose pull && docker-compose up -d
```
### Send messages
With the services running, you can send a message to the defined scheduler topic (`scheduler` in the example
above). See the [Schema](#schema) section for details of generating the Avro schema to be used.
### Monitoring
Metrics are exposed and reported using OpenTelemetry. By default, the [otel4s](https://typelevel.org/otel4s/index.html) is used for reporting and the scraping endpoint for Prometheus is exposed on port `9401` (this is configurable by setting the `OTEL_EXPORTER_PROMETHEUS_PORT` environment variable).
Prometheus is included as part of the docker-compose and will expose a monitoring dashboard on port `9090`.
### Topic configuration
The `schedule-topics` must be configured to use [log compaction](https://kafka.apache.org/documentation/#compaction). This is for two reasons:
1. to allow the scheduler to delete the schedule after it has been written to its destination topic.
2. because the scheduler uses the `schedule-topics` to reconstruct its state - in case of a restart of the
KMS, this ensures that schedules are not lost.
#### Recommended configuration
It is advised that the log compaction configuration of the `schedule-topics` is quite aggressive to keep the restart times low, see below for recommended configuration:
```
cleanup.policy: compact
delete.retention.ms: 3600000
min.compaction.lag.ms: 0
min.cleanable.dirty.ratio: "0.1"
segment.ms: 86400000
segment.bytes: 100000000
```
## Limitations
Until [this issue](/../../issues/69) is addressed the KMS does not fully support horizontal scaling. Multiple instances can be run, and Kafka will balance the partitions, however schedules are likely to be duplicated as when a rebalance happens the state for the rebalanced partition will not be removed from the original instance. If there is a desire to run multiple instances before that issue is addressed, it is best to not attempt dynamic scaling, but to start with your desired number of instances.