Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/recipegrace/biglibrary

electric spark

Last synced: 24 days ago
JSON representation

Host: GitHub
URL: https://github.com/recipegrace/biglibrary
Owner: recipegrace
Created: 2015-09-25T19:59:18.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2019-10-20T03:58:02.000Z (over 5 years ago)
Last Synced: 2024-11-15T12:11:58.640Z (3 months ago)
Topics: electric, spark
Language: Scala
Size: 3.21 MB
Stars: 2
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        
# BigLibrary

[![Build Status](https://travis-ci.org/recipegrace/BigLibrary.svg?branch=master)](https://travis-ci.org/recipegrace/BigLibrary)

WORA (Write Once Run Anywhere) framework

The Biglibrary has been designed as a wrapper for bigdata programs (currently implemented for SPARK). 

The library enables programmers 1) to customize execution for local and cluster modes, 2) functions for boiler plate code, and 3) guarantees deployable code.

This project realizes the idea of executable pipelines that are unaware of the data.

Biglibrary realize a bigdata program as a pair: 1) Actual job and 2) Test job. The actual job can be executed in a cluster using the 

ScriptDB.  Few examples implemented using the BigLibrary  are given below.

WordCount

```scala

object WordCount extends SequenceFileJob[InputAndOutput] {

  override def execute(argument: InputAndOutput)(implicit ec: ElectricSession) = {

    val session = ec.getSparkSession

    import session.implicits._

    val file = ec.text(argument.input)

    val words = file.flatMap(_.toLowerCase.replaceAll("[^a-zA-Z0-9\\s]", "").split("\\s+"))

    words

      .groupByKey(f => f)

      .count()

      .write

      .option("delimiter", "\t")

      .csv(argument.output)

  }

}

```  

WordCountTest 

```scala

class WordCountTest extends ElectricJobTest {

  test("wordcount test with spark") {

    val input = createFile {

      """

        hello world

        Zero world

        Some world

      """.stripMargin

    }

    val output = createTempPath()

    launch(WordCount, InputAndOutput(input, output))

    val lines = readFilesInDirectory(output, "part")

    lines should contain("hello\t1")

    lines should contain("world\t3")

  }

}

```

# Developers 

[OSS + travis+ sbt](https://github.com/recipegrace/BigLibrary/blob/master/doc/Oss-publish-travis.md)