https://github.com/funkygao/dbus
yet another databus that transfer/transform pipeline data between plugins
https://github.com/funkygao/dbus
binlog databus dataflow distributed kafka messaging mysql pipeline replication
Last synced: 3 months ago
JSON representation
yet another databus that transfer/transform pipeline data between plugins
- Host: GitHub
- URL: https://github.com/funkygao/dbus
- Owner: funkygao
- License: apache-2.0
- Created: 2017-01-25T02:39:27.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-06-29T09:04:26.000Z (almost 9 years ago)
- Last Synced: 2024-06-20T03:36:23.072Z (almost 2 years ago)
- Topics: binlog, databus, dataflow, distributed, kafka, messaging, mysql, pipeline, replication
- Language: Go
- Homepage:
- Size: 939 KB
- Stars: 26
- Watchers: 8
- Forks: 7
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# dbus
$$\ $$\
$$ | $$ |
$$$$$$$ | $$$$$$$\ $$\ $$\ $$$$$$$\
$$ __$$ | $$ __$$\ $$ | $$ | $$ _____|
$$ / $$ | $$ | $$ | $$ | $$ | \$$$$$$\
$$ | $$ | $$ | $$ | $$ | $$ | \____$$\
\$$$$$$$ | $$$$$$$ | \$$$$$$ | $$$$$$$ |
\_______| \_______/ \______/ \_______/
### What is dbus?
dbus = distributed data bus
It is yet another lightweight versatile databus system that transfer/transform pipeline data between plugins.
dbus works by building a DAG of structured data out of the different plugins: from data input, via filter(optional), to the output.
Similar projects
- logstash
- flume
- nifi
- camel
- beats
- kettle
- zapier
- google cloud dataflow
- canal
- storm
- yahoo pipes (dead)
### Status
dbus is not yet a 1.0.
We're writing more tests, fixing bugs, working on TODOs.
### Use Case
- mysql binlog dispatcher
- multiple DC kafka mirror
### Features
dbus supports powerful and scalable directed graphs of data routing, transformation and
system mediation logic.
- Designed for extension
- plugin architecture
- build your own plugins and more
- enables rapid development and effective testing
- Data Provenance
- track dataflow from beginning to end
- visualized dataflow
- rich metrics feed into tsdb
- online manual mediation of the dataflow
- RESTful API
- monitoring with alert
- Distributed Deployment
- shard/balance/auto rebalance
- linear scale
- Delivery Guarantee
- loss tolerant
- high throuput vs low latency
- back pressure
- Robustness
- race condition detected
- edge cases fully covered
- network jitter tested
- dependent components failure tested
- Systemic Quality
- hot reload
- dryrun throughput 1.9M packets/s
- Cluster Support
- modelling borrowed from helix+kafka controller
- currently only leader/standby with sharding, without replica
- easy to write a distributed plugin
### Getting Started
#### 1. Installing
To start using dbus, install Go and run `go get`:
```sh
$ go get -u github.com/funkygao/dbus
```
#### 2. Create config file
Please find sample config files in etc/ directory.
#### 3. Run the server
```sh
$ $GOPATH/dbusd -conf $myfile
```
### Dependencies
dbus uses zookeeper for sharding/balance/election.
### Plugins
More plugins are listed under [dbus-plugin](https://github.com/dbus-plugin).
#### Input
- MysqlbinlogInput
- KafkaInput
- MockInput
- StreamInput
#### Filter
- MysqlbinlogFilter
- MockFilter
#### Output
- KafkaOutput
- ESOutput
- MockOutput
- StreamOutput
### Configuration
- KafkaOutput async mode with batch=1024/500ms, ack=WaitForAll
- KafkaOutput retry=3, retry.backoff=350ms
- Mysql binlog positioner commit every 1s, channal buffer 100
### FAQ
#### mysql binlog
- is it totally data loss tolerant?
if the binlog exceeds 1MB, it will be discarded(lost)
#### why not canal?
- no Delivery Guarantee
- no Data Provenance
- no integration with kafka
- only hot standby deployment mode, we need sharding load
- dbus is a dataflow engine, while canal only support mysql binlog pipeline
#### compared with logstash
- logstash has better ecosystem
- dbus is cluster aware, provides delivery guarantee, data provenance
#### can there be more than 1 leaders at the same time?
Yes.
For example, 3 participants with 1 being the leader. Then 1 is network partitioned and
zk session expires, [2, 3] found this event and re-elect 2 as new leader.
Before 1 regain new zk session, [1] and [2] are leaders both.
If [1] and [2] both found resources changes, they will both rebalance the cluster.
dbus uses epoch to solve this issue.
#### what if
- zookeeper crash
dbus continues to work, but Ack will not be able to persist