Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/recipegrace/biglibrary
https://github.com/recipegrace/biglibrary
electric spark
Last synced: 24 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/recipegrace/biglibrary
- Owner: recipegrace
- Created: 2015-09-25T19:59:18.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2019-10-20T03:58:02.000Z (over 5 years ago)
- Last Synced: 2024-11-15T12:11:58.640Z (3 months ago)
- Topics: electric, spark
- Language: Scala
- Size: 3.21 MB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# BigLibrary
[![Build Status](https://travis-ci.org/recipegrace/BigLibrary.svg?branch=master)](https://travis-ci.org/recipegrace/BigLibrary)
WORA (Write Once Run Anywhere) framework
The Biglibrary has been designed as a wrapper for bigdata programs (currently implemented for SPARK).
The library enables programmers 1) to customize execution for local and cluster modes, 2) functions for boiler plate code, and 3) guarantees deployable code.
This project realizes the idea of executable pipelines that are unaware of the data.
Biglibrary realize a bigdata program as a pair: 1) Actual job and 2) Test job. The actual job can be executed in a cluster using the
ScriptDB. Few examples implemented using the BigLibrary are given below.WordCount
```scala
object WordCount extends SequenceFileJob[InputAndOutput] {
override def execute(argument: InputAndOutput)(implicit ec: ElectricSession) = {
val session = ec.getSparkSessionimport session.implicits._
val file = ec.text(argument.input)
val words = file.flatMap(_.toLowerCase.replaceAll("[^a-zA-Z0-9\\s]", "").split("\\s+"))
words
.groupByKey(f => f)
.count()
.write
.option("delimiter", "\t")
.csv(argument.output)}
}
```
WordCountTest```scala
class WordCountTest extends ElectricJobTest {
test("wordcount test with spark") {
val input = createFile {
"""
hello world
Zero world
Some world
""".stripMargin
}
val output = createTempPath()
launch(WordCount, InputAndOutput(input, output))
val lines = readFilesInDirectory(output, "part")
lines should contain("hello\t1")
lines should contain("world\t3")
}
}
```# Developers
[OSS + travis+ sbt](https://github.com/recipegrace/BigLibrary/blob/master/doc/Oss-publish-travis.md)