https://github.com/xerial/chroniker

Simplify your batch job pipelines with Scala
https://github.com/xerial/chroniker

Last synced: 9 months ago
JSON representation

Simplify your batch job pipelines with Scala

Host: GitHub
URL: https://github.com/xerial/chroniker
Owner: xerial
Created: 2015-08-12T07:31:21.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2018-03-02T17:50:51.000Z (almost 8 years ago)
Last Synced: 2025-04-13T00:58:48.419Z (9 months ago)
Language: Scala
Homepage:
Size: 39.1 KB
Stars: 4
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Chroniker

_Chroniker_ is a framework for simplifying your batch job pipelines in Scala

## Examples

```scala

import xerial.chroniker._

import sampledb._

// SELECT count(*) FROM nasdaq

def dataCount = nasdaq.size

// SELECT time, close FROM nasdaq WHERE symbol = 'APPL'

def appleStock = nasdaq.filter(_.symbol is "APPL").select(_.time, _.close)

// You can use a raw SQL statjement as well:

def appleStockSQL = sql"SELECT time, close FROM nasdaq where symbol = 'APPL'"

// SELECT time, close FROM nasdaq WHERE symbol = 'APPL' LIMIT 10

appleStock.limit(10).print

// time-column based filtering

appleStock.between("2015-05-01", "2015-06-01")

for(company <- Seq("YHOO", "GOOG", "MSFT")) yield {

  nasdaq.filter(_.symbol is company).selectAll

}

```

## Milestones

 - Build SQL + local analysis workflows

 - Submit queries to Presto / Treasure Data

 - Run scheduled queries

 - Retry upon failures

 - Cache intermediate results

 - Resume workflow

 - Partial workflow executions

 - Sampling display

    - Interactive mode

 - Split a large query into small ones

    - Differential computation for time-series data

 - Windowing for stream queries

 - Object-oriented workflow

 - Input Source: fluentd/embulk

 - Output Source:

 - Workflow Executor

   - Local-only mode

   - Register SQL part to Treasure Data

   - Run complex analysis on local cache

   - UNIX command executor

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/xerial/chroniker

Awesome Lists containing this project

README