https://github.com/xerial/chroniker
Simplify your batch job pipelines with Scala
https://github.com/xerial/chroniker
Last synced: 9 months ago
JSON representation
Simplify your batch job pipelines with Scala
- Host: GitHub
- URL: https://github.com/xerial/chroniker
- Owner: xerial
- Created: 2015-08-12T07:31:21.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2018-03-02T17:50:51.000Z (almost 8 years ago)
- Last Synced: 2025-04-13T00:58:48.419Z (9 months ago)
- Language: Scala
- Homepage:
- Size: 39.1 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Chroniker
_Chroniker_ is a framework for simplifying your batch job pipelines in Scala
## Examples
```scala
import xerial.chroniker._
import sampledb._
// SELECT count(*) FROM nasdaq
def dataCount = nasdaq.size
// SELECT time, close FROM nasdaq WHERE symbol = 'APPL'
def appleStock = nasdaq.filter(_.symbol is "APPL").select(_.time, _.close)
// You can use a raw SQL statjement as well:
def appleStockSQL = sql"SELECT time, close FROM nasdaq where symbol = 'APPL'"
// SELECT time, close FROM nasdaq WHERE symbol = 'APPL' LIMIT 10
appleStock.limit(10).print
// time-column based filtering
appleStock.between("2015-05-01", "2015-06-01")
for(company <- Seq("YHOO", "GOOG", "MSFT")) yield {
nasdaq.filter(_.symbol is company).selectAll
}
```
## Milestones
- Build SQL + local analysis workflows
- Submit queries to Presto / Treasure Data
- Run scheduled queries
- Retry upon failures
- Cache intermediate results
- Resume workflow
- Partial workflow executions
- Sampling display
- Interactive mode
- Split a large query into small ones
- Differential computation for time-series data
- Windowing for stream queries
- Object-oriented workflow
- Input Source: fluentd/embulk
- Output Source:
- Workflow Executor
- Local-only mode
- Register SQL part to Treasure Data
- Run complex analysis on local cache
- UNIX command executor