An open API service indexing awesome lists of open source software.

https://github.com/topfreegames/go-etl

Go ETL using Ratchet
https://github.com/topfreegames/go-etl

Last synced: 6 months ago
JSON representation

Go ETL using Ratchet

Awesome Lists containing this project

README

          

Go ETL
=======

Go ETL using pipelines

## Start

`make start`

## Configure

To configure, edit ./config/config.yaml to load a new pipeline.

To add a custom ETL, create a new plugin on ./plugins and add is on config.yaml.

## Examples

### Add an ETL code on config.yaml

1) Add on config/config.yaml:
```yaml
workers:
- schedule:
hour: 20 # UTC time
minute: 0
job:
name: http-requestor
code: |
package main

import (
"github.com/topfreegames/go-etl/processors"
"github.com/topfreegames/go-etl/models"
)

type etl string

func (e etl) Extract() models.DataProcessor {
return processors.NewHTTPRequestor("GET", "http://localhost:8080")
}

func (e etl) Transform() models.DataProcessor {
return &processors.Logger{}
}

func (e etl) Load() models.DataProcessor {
return &processors.Null{}
}

// ETL is the exported symbol of this plugin
var ETL etl
```

2) Start:
```bash
make start
```

### Create a new ETL plugin

1) Create a new plugin on ./plugins like this:
```golang
// ./plugins/http-requestor/main.go

package main

import (
"github.com/topfreegames/go-etl/processors"
"github.com/topfreegames/go-etl/models"
)

type etl string

func (e etl) Extract() models.DataProcessor {
return processors.NewHTTPRequestor("GET", "http://localhost:8080")
}

func (e etl) Transform() models.DataProcessor {
return &processors.Logger{}
}

func (e etl) Load() models.DataProcessor {
return &processors.Null{}
}

// ETL is the exported symbol of this plugin
var ETL etl
```

2) Build the plugin binary:

```bash
make plugins
```

3) Add on config/config.yaml:
```yaml
workers:
- period: 1h
job:
name: http-requestor
```

4) Start:
```bash
make start
```

# Next steps

- [ ] Better logging
- [ ] Some shared memory (maybe redis?) to allow replication and not execute job twice
- [X] Not crash application when wrong script (not found or code that doesn't compile)
- [ ] Unit tests
- [ ] Integration tests