An open API service indexing awesome lists of open source software.

https://github.com/xerial/chroniker

Simplify your batch job pipelines with Scala
https://github.com/xerial/chroniker

Last synced: 9 months ago
JSON representation

Simplify your batch job pipelines with Scala

Awesome Lists containing this project

README

          

# Chroniker

_Chroniker_ is a framework for simplifying your batch job pipelines in Scala

## Examples

```scala
import xerial.chroniker._

import sampledb._

// SELECT count(*) FROM nasdaq
def dataCount = nasdaq.size

// SELECT time, close FROM nasdaq WHERE symbol = 'APPL'
def appleStock = nasdaq.filter(_.symbol is "APPL").select(_.time, _.close)

// You can use a raw SQL statjement as well:
def appleStockSQL = sql"SELECT time, close FROM nasdaq where symbol = 'APPL'"

// SELECT time, close FROM nasdaq WHERE symbol = 'APPL' LIMIT 10
appleStock.limit(10).print

// time-column based filtering
appleStock.between("2015-05-01", "2015-06-01")

for(company <- Seq("YHOO", "GOOG", "MSFT")) yield {
nasdaq.filter(_.symbol is company).selectAll
}

```

## Milestones

- Build SQL + local analysis workflows
- Submit queries to Presto / Treasure Data
- Run scheduled queries
- Retry upon failures
- Cache intermediate results
- Resume workflow
- Partial workflow executions
- Sampling display
- Interactive mode
- Split a large query into small ones
- Differential computation for time-series data

- Windowing for stream queries

- Object-oriented workflow

- Input Source: fluentd/embulk
- Output Source:

- Workflow Executor
- Local-only mode
- Register SQL part to Treasure Data
- Run complex analysis on local cache
- UNIX command executor