https://github.com/vivekkothari/data-river

Replicates data from mysql to any datastore of your choice (relies on maxwell)
https://github.com/vivekkothari/data-river

data-river dropwizard-application maxwell mysql replicate-data

Last synced: 5 months ago
JSON representation

Replicates data from mysql to any datastore of your choice (relies on maxwell)

Host: GitHub
URL: https://github.com/vivekkothari/data-river
Owner: vivekkothari
License: apache-2.0
Created: 2016-08-12T08:45:36.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2023-11-22T20:58:13.000Z (over 2 years ago)
Last Synced: 2025-07-06T07:48:14.166Z (12 months ago)
Topics: data-river, dropwizard-application, maxwell, mysql, replicate-data
Language: Java
Homepage:
Size: 57.6 KB
Stars: 1
Watchers: 1
Forks: 6
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# data-river [![Build Status](https://travis-ci.org/vivekkothari/data-river.svg?branch=master)](https://travis-ci.org/vivekkothari/data-river) [![Coverage Status](https://coveralls.io/repos/github/vivekkothari/data-river/badge.svg?branch=master)](https://coveralls.io/github/vivekkothari/data-river?branch=master)

Replicates data from mysql to any datastore of your choice (relies on maxwell)

### Maven Dependency

* Use the following maven dependency:

```

com.github.vivekkothari
elastic-search-persister
1.2.1

```

Imagine you have 2 tables in MySql, `Table1` and `Table2`. You would have to
configure [maxwell](http://maxwells-daemon.io). Once maxwell is properly configured, lets say you
want to persist changes in the above tables in [Elastic-Search](https://www.elastic.co). We would
create 2 rivers, `table1_river` and `table2_river`. Then for each of these `riverType`, provide
implementation of `IFilter`, `IEnricher` and `IBackFiller` and build a
[`TransformerFactory`](https://github.com/vivekkothari/data-river/blob/master/data-river-core/src/main/java/com/github/vivekkothari/river/service/TransformerFactory.java)

Incoming kafka message goes through following 3 steps:

1.
Filtering: [`IFilter`](https://github.com/vivekkothari/data-river/blob/master/data-river-core/src/main/java/com/github/vivekkothari/river/service/IFilter.java)
governs whether the incoming message should be processed or not.
2.
Enrichment: [`IEnricher`](https://github.com/vivekkothari/data-river/blob/master/data-river-core/src/main/java/com/github/vivekkothari/river/service/IEnricher.java)
provides a way to enrich the incoming message. (think of joining the row with some other row)
3.
Persistence: [`IPersister`](https://github.com/vivekkothari/data-river/blob/master/data-river-core/src/main/java/com/github/vivekkothari/river/service/IPersister.java)
persists the message in your desired data store.

There is also and admin task on the admin port of your dropwizard application which can be used to
backfill the data. Example

```
http://localhost:8080/admin/backfill?startDate=2016-01-01T00:00:00&endDate=2016-01-10T00:00:00&riverType=river1
```

Add following code in the `run` method of you dropwizard application.
Add `com.github.vivekkothari.persister.ESRiverConfiguration` in your `Configuration` class Configure
appropriately (like Elasticsearch hosts, bulk index configs etc.) then call following method.

```
configuration.getEsRiverConfiguration().build(environment);
```

The `IPersister` can be extended and you can add a new data store to which the messages can be
stored.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vivekkothari/data-river

Awesome Lists containing this project

README