https://github.com/xerial/silk
Simplify SQL Workflows with Scala
https://github.com/xerial/silk
Last synced: 5 months ago
JSON representation
Simplify SQL Workflows with Scala
- Host: GitHub
- URL: https://github.com/xerial/silk
- Owner: xerial
- License: apache-2.0
- Created: 2012-01-06T13:53:15.000Z (about 14 years ago)
- Default Branch: master
- Last Pushed: 2020-03-13T22:40:45.000Z (almost 6 years ago)
- Last Synced: 2025-08-16T09:51:07.397Z (5 months ago)
- Language: CSS
- Homepage: http://xerial.org/silk
- Size: 14.6 MB
- Stars: 38
- Watchers: 10
- Forks: 7
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Silk: A framework for managing SQL data flows.
http://xerial.org/silk
## Examples
```scala
import xerial.silk.core._
import sampledb._
// SELECT count(*) FROM nasdaq
def dataCount = nasdaq.size
// SELECT time, close FROM nasdaq WHERE symbol = 'APPL'
def appleStock = nasdaq.filter(_.symbol is "APPL").select(_.time, _.close)
// You can use a raw SQL statjement as well:
def appleStockSQL = sql"SELECT time, close FROM nasdaq where symbol = 'APPL'"
// SELECT time, close FROM nasdaq WHERE symbol = 'APPL' LIMIT 10
appleStock.limit(10).print
// time-column based filtering
appleStock.between("2015-05-01", "2015-06-01")
for(company <- Seq("YHOO", "GOOG", "MSFT")) yield {
nasdaq.filter(_.symbol is company).selectAll
}
```
## Milestones
- Build SQL + local analysis workflows
- Submit queries to Presto / Treasure Data
- Run scheduled queries
- Retry upon failures
- Cache intermediate results
- Resume workflow
- Partial workflow executions
- Sampling display
- Interactive mode
- Split a large query into small ones
- Differential computation for time-series data
- Windowing for stream queries
- Object-oriented workflow
- Input Source: fluentd/embulk
- Output Source:
- Workflow Executor
- Local-only mode
- Register SQL part to Treasure Data
- Run complex analysis on local cache
- UNIX command executor