Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/recipegrace/biglibrary


https://github.com/recipegrace/biglibrary

electric spark

Last synced: 5 days ago
JSON representation

Awesome Lists containing this project

README

        

# BigLibrary

[![Build Status](https://travis-ci.org/recipegrace/BigLibrary.svg?branch=master)](https://travis-ci.org/recipegrace/BigLibrary)

WORA (Write Once Run Anywhere) framework


The Biglibrary has been designed as a wrapper for bigdata programs (currently implemented for SPARK).
The library enables programmers 1) to customize execution for local and cluster modes, 2) functions for boiler plate code, and 3) guarantees deployable code.
This project realizes the idea of executable pipelines that are unaware of the data.
Biglibrary realize a bigdata program as a pair: 1) Actual job and 2) Test job. The actual job can be executed in a cluster using the
ScriptDB. Few examples implemented using the BigLibrary are given below.

WordCount

```scala
object WordCount extends SequenceFileJob[InputAndOutput] {
override def execute(argument: InputAndOutput)(implicit ec: ElectricSession) = {
val session = ec.getSparkSession

import session.implicits._
val file = ec.text(argument.input)
val words = file.flatMap(_.toLowerCase.replaceAll("[^a-zA-Z0-9\\s]", "").split("\\s+"))
words
.groupByKey(f => f)
.count()
.write
.option("delimiter", "\t")
.csv(argument.output)

}

}
```
WordCountTest

```scala
class WordCountTest extends ElectricJobTest {
test("wordcount test with spark") {
val input = createFile {
"""
hello world
Zero world
Some world
""".stripMargin
}
val output = createTempPath()
launch(WordCount, InputAndOutput(input, output))
val lines = readFilesInDirectory(output, "part")
lines should contain("hello\t1")
lines should contain("world\t3")
}
}
```

# Developers

[OSS + travis+ sbt](https://github.com/recipegrace/BigLibrary/blob/master/doc/Oss-publish-travis.md)