Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kochava/firehose
A Kafka topic transfer agent
https://github.com/kochava/firehose
tf
Last synced: about 1 month ago
JSON representation
A Kafka topic transfer agent
- Host: GitHub
- URL: https://github.com/kochava/firehose
- Owner: Kochava
- License: apache-2.0
- Created: 2016-10-19T23:39:35.000Z (about 8 years ago)
- Default Branch: main
- Last Pushed: 2021-08-04T15:31:51.000Z (over 3 years ago)
- Last Synced: 2024-06-20T00:26:39.872Z (6 months ago)
- Topics: tf
- Language: Go
- Homepage:
- Size: 82 KB
- Stars: 1
- Watchers: 10
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
Firehose [![Build Status](https://travis-ci.org/Kochava/firehose.svg?branch=master)](https://travis-ci.org/Kochava/firehose) [![Coverage Status](https://coveralls.io/repos/github/Kochava/firehose/badge.svg?branch=master)](https://coveralls.io/github/Kochava/firehose?branch=master)
======Firehose is a Kafka transfer agent which can do real-time transferring of a Topic from one set of Brokers to another. This is useful when you want the two clusters to remain independent and not use the built in replication process of Kafka. NOTE: This is mostly meant for seeding one cluster with another and not for any process needing data loss guarantees, for that check out [uReplicator](https://github.com/uber/uReplicator)
### Requirements
* Docker (for local testing)## Install
Clone this repository then:
```
$ git clone --recursive https://github.com/Kochava/firehose.git
$ cd firehose
$ go build ./cmd/firehose
```#### Update Dependancies
```
$ git submodule update --recursive --remote
```## Usage
```
NAME:
Kochava Kafka Transfer Agent - An agent which consumes a topic from one set of brokers and publishes to another set of brokersUSAGE:
firehose [global options] command [command options] [arguments...]VERSION:
0.3.4GLOBAL OPTIONS:
--src-zookeepers value Comma delimited list of zookeeper nodes to connect to for the source brokers [$FIREHOSE_SRC_ZOOKEEPERS]
--dst-zookeepers value Comma delimited list of zookeeper nodes to connect to for the destination brokers [$FIREHOSE_DST_ZOOKEEPERS]
--topic value Topic to transfer (default: "firehose") [$FIREHOSE_TOPIC]
--cg-name-suffix value Suffix to use for the consumer group name in the format _ (default: "firehose") [$FIREHOSE_CG_NAME_SUFFIX]
--buffer-size value The number of messages to hold in memory at once (default: 10000) [$FIREHOSE_BUFFER_SIZE]
--max-errors value The maximum number of errors to allow the kafka to experience before quitting (default: 10) [$FIREHOSE_MAX_ERROR]
--max-retry value The maximum number of times to retry sending a message to the destination cluster (default: 5) [$FIREHOSE_MAX_RETRY]
--batch-size value The number of messages to batch together when sending to the destination cluster (default: 500) [$FIREHOSE_BATCH_SIZE]
--flush-interval value The interval (in ms) to flush messages that haven't been sent in a batch yet (default: 10000) [$FIREHOSE_FLUSH_INTERVAL]
--reset-offset Resets the offset in the consumer group to real-time before starting [$FIREHOSE_RESET_OFFSET]
--influx-address value Influx address (default: "http://localhost:8086") [$FIREHOSE_INFLUX_ADDRESS]
--influx-user value Influx user name (default: "firehose") [$FIREHOSE_INFLUX_USER]
--influx-pass value Influx password (default: "firehose") [$FIREHOSE_INFLUX_PASS]
--influx-db value Influx database name (default: "firehose") [$FIREHOSE_INFLUX_DB]
--consumer-concurrency value Number of consumer threads to run (default: 4) [$FIREHOSE_CONSUMER_CONCURRENCY]
--producer-concurrency value Number of producer threads to run (default: 4) [$FIREHOSE_PRODUCER_CONCURRENCY]
--log-file value Main log file to save to (default: "/var/log/firehose/firehose.log") [$FIREHOSE_LOG_FILE]
--stdout-logging Override logging settings to log to STDOUT [$FIREHOSE_STDOUT_LOGGING]
--help, -h show help
--version, -v print the version
```### Getting Started
Once you have the repo downloaded the following is all that's needed to get started testing locally. First grab your docker machine IP address, then in `Docker/test-compose.yml` update the `KAFKA_ADVERTISED_HOST_NAME` EnvVar to use this.
```
$ go build ./cmd/firehose
$ docker-compose -f Docker/test-compose.yml up -d
$ ./firehose --src-zookeepers=":2181" --dst-zookeepers=":2182" --stdout-logging
```If you want to scale up the kafka clusters you can do so
```
$ docker-compose -f Docker/test-compose.yml scale kafka_src=3
```Once up and running you can access Grafana at http://localhost:3000 with the default username and password. Next just add a new datasource, choosing `InfluxDB` and method `proxy`. For the location use `http://172.18.0.2:8086` with the default Influx db of `firehose` and user:pass being `firehose:firehose`. Once that's setup you can import the simple dashboard preprepared under `dist/dashboard.json`.
#### Cloud Building
You can leverage Google Cloud Building in order to build the binary and generate a docker image.
```
$ PROJECT_ID=your-project-id BUILD_VERSION=v1 ./cloudbuild.sh
```## Contributing
### Grab the source and make a branch
1. Fork it
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Make your changes
4. Add some tests
5. Commit your changes (`git commit -am 'Add some feature'`)
6. Push to the branch (`git push origin my-new-feature`)
7. Create new Pull Request### TODO
With the new refactor a lot of work was put into performance, because of that the historical transfer aspect was essentially scrapped in this release. It will be added back in with the next release.
#### Github
The Github version of this repo is a mirror of Master from our internal repo. This means that feature branches are not available here.## Default Branch
As of October 1, 2020, github.com uses the branch name ‘main’ when creating the initial default branch for all new repositories. In order to minimize any customizations in our github usage and to support consistent naming conventions, we have made the decision to rename the ‘master’ branch to be called ‘main’ in all Kochava’s github repos.
For local copies of the repo, the following steps will update to the new default branch:
```
git branch -m master main
git fetch origin
git branch -u origin/main main
git remote set-head origin -a
```