https://github.com/etf1/kafka-mongo-watcher
A MongoDB collection watcher that pushes oplog events into Kafka
https://github.com/etf1/kafka-mongo-watcher
database event-driven go golang kafka mongodb
Last synced: about 2 months ago
JSON representation
A MongoDB collection watcher that pushes oplog events into Kafka
- Host: GitHub
- URL: https://github.com/etf1/kafka-mongo-watcher
- Owner: etf1
- Created: 2020-03-05T11:48:00.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2026-04-22T11:08:01.000Z (2 months ago)
- Last Synced: 2026-04-22T11:12:29.835Z (2 months ago)
- Topics: database, event-driven, go, golang, kafka, mongodb
- Language: Go
- Homepage:
- Size: 95.1 MB
- Stars: 16
- Watchers: 7
- Forks: 5
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Kafka MongoDB Watcher
/badge.svg)
[](https://godoc.org/github.com/etf1/kafka-mongo-watcher)
This project listens for a MongoDB collection events (insert, update, delete, ...) also called "oplogs" for operation logs and distribute them into a Kafka topic of your choice.
There is also a replay mode that allows you to initialize all items of a collection into a Kafka topic for the first time.
## Prerequisites
In addition of the binary, you will also need the following the Kafka library:
* [librdkafka](https://github.com/edenhill/librdkafka) (report to the Installation part)
## Installation
### Download binary
You can download the latest version of the binary built for your architecture here:
* Architecture **i386** [
[Darwin](https://github.com/etf1/kafka-mongo-watcher/releases/latest/download/kafka-mongo-watcher-darwin-386) /
[Linux](https://github.com/etf1/kafka-mongo-watcher/releases/latest/download/kafka-mongo-watcher-linux-386) /
[Windows](https://github.com/etf1/kafka-mongo-watcher/releases/latest/download/kafka-mongo-watcher-windows-386.exe)
]
* Architecture **amd64** [
[Darwin](https://github.com/etf1/kafka-mongo-watcher/releases/latest/download/kafka-mongo-watcher-darwin-amd64) /
[Linux](https://github.com/etf1/kafka-mongo-watcher/releases/latest/download/kafka-mongo-watcher-linux-amd64) /
[Windows](https://github.com/etf1/kafka-mongo-watcher/releases/latest/download/kafka-mongo-watcher-windows-amd64.exe)
]
### Using Docker
The watcher is also available as a [Docker image](https://hub.docker.com/r/etf1/kafka-mongo-watcher).
You can run it using the following example and pass configuration environment variables:
```bash
$ docker run \
-e 'REPLAY=true' \
etf1/kafka-mongo-watcher:latest
```
### From sources
Optionally, you can also download and build it from the sources. You have to retrieve the project sources by using one of the following way:
```bash
$ go get -u github.com/etf1/kafka-mongo-watcher
# or
$ git clone https://github.com/etf1/kafka-mongo-watcher.git
```
Then, build the binary:
```bash
$ GOOS=linux GOARCH=amd64 go build -ldflags '-s -w' -o kafka-mongo-watcher ./cmd/watcher/
```
## Usage
In order to run the watcher, type the following command with the desired arguments.
You can use flags (as in this example) or environment variables:
```bash
$ ./kafka-mongo-watcher -REPLAY=true
...
HTTP server started {"facility":"kafka-mongo-watcher","version":"wip","addr":":8001","file":"/usr/local/Cellar/go/1.14/libexec/src/runtime/asm_amd64.s","line":1373}
Connected to mongodb database {"facility":"kafka-mongo-watcher","version":"wip","uri":"mongodb://root:toor@127.0.0.1:27011,127.0.0.1:27012,127.0.0.1:27013/watcher?replicaSet=replicaset\u0026authSource=admin"}
Connected to kafka producer {"facility":"kafka-mongo-watcher","version":"wip","bootstrap-servers":"127.0.0.1:9092"}
...
```
## Available configuration variables
In dev environment you can copy `.env.dist` in `.env` and edit his content in order to customize easily the env variables.
You can set/override configuration variables from `.env` file and from `variables environment` and or from cli arguments
(If a variables was configured in multiple sources the last will override the previous one)
Configuration variables with prefix are first loaded and then without prefix. For example if you define `KAFKA_MONGO_WATCHER_MONGODB_URI=xxxx` it will used for the mongo uri, even if `MONGODB_URI=yyyy` is set. This allows some overriding case, sometimes useful inside kubernetes cluster.
#### KAFKA_MONGO_WATCHER_PREFIX
*Type*: string
*Description*: In case you want to specify a different prefix (not `KAFKA_MONGO_WATCHER`) for all configuration environment variables.
*Example value*: `KAFKA_MONGO_WATCHER_PREFIX=CUSTOM` in this case
#### CUSTOM_PIPELINE
*Type*: string
*Description*: In case you want to specify a filtering pipeline, you can specify it here. It works both wil replay and watch mode.
*Example value*: `[ { "$match": { "fullDocument.is_active": true } }, { $addFields: { "custom-field": "custom-value" } } ]`
#### REPLAY
*Type*: bool
*Description*: In case you want to send all collection's documents once (default: false)
**Hint**: You can also use some built-in variables such as `%currentTimestamp%` that will put the current timestamp value right in the aggregation pipeline.
*Example value with variables*: `[ { "$match": { "date": { "$gt": { "$date": { "$numberLong": "%currentTimestamp%" } } } } } ]`
#### MONGODB_URI
*Type*: string
*Description*: The MongoDB connection string URI (default: mongodb://root:toor@127.0.0.1:27011,...)
#### MONGODB_COLLECTION_NAME
*Type*: string
*Description*: The MongoDB collection you want to watch (default: "items")
#### MONGODB_DATABASE_NAME
*Type*: string
*Description*: The MongoDB database name you want to connect to (default: "watcher")
#### MONGODB_SERVER_SELECTION_TIMEOUT
*Type*: duration
*Description*: The MongoDB server selection timeout duration (default: 2s)
#### MONGODB_OPTION_BATCH_SIZE
*Type*: integer
*Description*: In case you want to enable watch batch size on MongoDB watch (default: 0 / no batch)
#### MONGODB_OPTION_FULL_DOCUMENT
*Type*: boolean
*Description*: In case you want to retrieve the full document when watching for oplogs (default: true)
#### MONGODB_OPTION_MAX_AWAIT_TIME
*Type*: duration
*Description*: In case you want to set a maximum value awaiting for new oplogs (default: 0 / don't stop)
#### MONGODB_OPTION_RESUME_AFTER
*Type*: string
*Description*: In case you want to set a logical starting point for the change stream (example : `{"_data": }`)
#### MONGODB_OPTION_START_AT_DELAY
*Type*: duration
*Description*: In case you want to set a starting point in the past (now - delay) for the change stream
#### MONGODB_OPTION_START_AT_OPERATION_TIME_I
*Type*: uint32 *(increment value)*
#### MONGODB_OPTION_START_AT_OPERATION_TIME_T
*Type*: uint32 *(timestamp)*
*Description*: In case you want to set a timestamp for the change stream to only return changes that occurred at or after the given timestamp (default: nil)
#### MONGODB_OPTION_WATCH_MAX_RETRIES
*Type*: integer
*Description*: The max number of retries when trying to watch a collection (default: 3, set to 0 to disable retry)
#### MONGODB_OPTION_WATCH_RETRY_DELAY
*Type*: duration
*Description*: Sleeping delay between two watch attempts (default: 500ms)
#### KAFKA_BOOTSTRAP_SERVERS
*Type*: string
*Description*: Kafka bootstrap servers list (default: "127.0.0.1:9092")
#### KAFKA_TOPIC
*Type*: string
*Description*: Kafka topic to write into (default: "kafka-mongo-watcher")
#### KAFKA_PRODUCE_CHANNEL_SIZE
*Type*: integer
*Description*: The maximum size of the internal channel producer size (default: 10000)
A big value here can increase the heap memory of the application as all the payload that have to be sent to Kafka will be maintained in channel.
#### KAFKA_MESSAGE_MAX_BYTES
*Type*: integer
*Description*: The maximum message size in bytes at the producer level (default: 1024*1024)
#### LOG_CLI_VERBOSE
*Type*: boolean
*Description*: Used to enable/disable log verbosity (default: true)
#### LOG_LEVEL
*Type*: string
*Description*: Used to define first level you want to start display logs (default: "info")
#### GRAYLOG_ENDPOINT
*Type*: string
*Description*: In case you want to push logs into a Graylog server, just fill this entry with the endpoint
#### HTTP_IDLE_TIMEOUT
*Type*: duration
*Description*: A idle timeout for HTTP technical server (default: 90s)
#### HTTP_READ_HEADER_TIMEOUT
*Type*: duration
*Description*: A read timeout for HTTP technical server (default: 1s)
#### HTTP_WRITE_TIMEOUT
*Type*: duration
*Description*: A write timeout for HTTP technical server (default: 10s)
#### HTTP_TECH_ADDR
*Type*: string
*Description*: A specified address for HTTP technical server to listen (default: ":8001")
#### PRINT_CONFIG
*Type*: boolean
*Description*: Used to enable/disable the configuration print at startup (default: true)
#### PPROF_ENABLED
*Type*: boolean
*Description*: In case you want to enable Go pprof debugging (default: true). No impact when not used
#### OPEN_TELEMETRY_COLLECTOR_ENDPOINT
*Type*: string
*Description*: In case you want to enable OpenTelemetry tracing, fill this with the : of your collector endpoint
#### OPEN_TELEMETRY_SAMPLE_RATIO
*Type*: float64
*Description*: A fraction between 0 and 1 to enable sampling OpenTelemetry traces
## Enable the debug UI
[
](https://youtu.be/6hyCkqHYFQ8)
You can enable this debug UI that will be available at [http://127.0.0.1:8001/](http://127.0.0.1:8001/).
You just have to set `HTTP_DEBUG_ENABLED=true`.
It will allows you to track real time activity on documents watched by your collection.
## Prometheus metrics
The watcher also exposes metrics about Go process and Watcher application.
These metrics can be scraped by Prometheus by browsing the following technical HTTP server endpoint: http://127.0.0.1:8001/metrics
## Run tests
Unit tests can be run with the following command:
```bash
$ go test -v -mod vendor ./...
```
And integration tests can be run with:
```bash
$ make test-integration
```
This will load needed mongodb and kafka containers and run the tests suite