https://github.com/emmalanguage/emma

A quotation-based Scala DSL for scalable data analysis.
https://github.com/emmalanguage/emma

dsl emma flink quotations scala scala-dsl scalable-data-analysis spark

Last synced: 11 months ago
JSON representation

A quotation-based Scala DSL for scalable data analysis.

Host: GitHub
URL: https://github.com/emmalanguage/emma
Owner: emmalanguage
License: apache-2.0
Created: 2014-03-03T17:30:30.000Z (over 12 years ago)
Default Branch: master
Last Pushed: 2022-07-07T21:07:08.000Z (about 4 years ago)
Last Synced: 2025-08-04T02:53:20.418Z (12 months ago)
Topics: dsl, emma, flink, quotations, scala, scala-dsl, scalable-data-analysis, spark
Language: Scala
Homepage: http://emma-language.org
Size: 9.16 MB
Stars: 63
Watchers: 15
Forks: 19
Open Issues: 38
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-java - Emma

README

          # Emma

*A quotation-based Scala DSL for scalable data analysis.*

[![Build Status](https://travis-ci.org/emmalanguage/emma.svg?branch=master)](https://travis-ci.org/emmalanguage/emma)

## Goals

Our goal is to improve developer productivity by hiding parallelism aspects behind a high-level, 

declarative API which maximises reuse of native Scala syntax and constructs.

Emma supports state-of-the-art dataflow engines such as 

[Apache Flink](https://flink.apache.org/) and 

[Apache Spark](https://spark.apache.org/) as backend co-processors.

## Features

DSLs for scalable data analysis are embedded through types.

In contrast, Emma is *based on quotations* (similar to [Quill](http://getquill.io/)).

This approach has two benefits.

First, it allows to reuse Scala-native, declarative constructs in the DSL.

Quoted Scala syntax such as 

[`for`-comprehensions](http://docs.scala-lang.org/tutorials/FAQ/yield.html),

[case-classes](http://docs.scala-lang.org/tutorials/tour/case-classes.html), and 

[pattern matching](http://docs.scala-lang.org/tutorials/tour/pattern-matching.html) 

are thereby lifted to an intermediate representation called *Emma Core*.

Second, it allows to *analyze and optimize* Emma Core terms holistically. 

Subterms of type `DataBag[A]` are thereby transformed and off-loaded to a parallel dataflow engine such as Apache Flink or Apache Spark.

## Examples

The [emma-examples](emma-examples/emma-examples-library/src/main/scala/org/emmalanguage/examples) module contains examples from various fields.

- Graph Analysis

  - [Connected Components](emma-examples/src/main/scala/org/emmalanguage/examples/graphs/ConnectedComponents.scala)

  - [Triangle Enumeration](emma-examples/src/main/scala/org/emmalanguage/examples/graphs/EnumerateTriangles.scala)

  - [Transitive Closure](emma-lib/src/main/scala/org/emmalanguage/lib/graphs/transitiveClosure.scala)

- Supervised Learning

  - [Naive Bayses Classification](emma-lib/src/main/scala/org/emmalanguage/lib/ml/classification/naiveBayes.scala)

- Unsupervised Learning

  - [k-Means Clustering](emma-lib/src/main/scala/org/emmalanguage/lib/ml/clustering/kMeans.scala)

- Text Processing

  - [Word Count](emma-examples/src/main/scala/org/emmalanguage/examples/text/WordCount.scala)

## Learn More

Check [emma-language.org](http://emma-language.org) for further information.

## Build

- JDK 7+ (preferably JDK 8)

- Maven 3

Run

```bash

mvn clean package -DskipTests

```

to build Emma without running any tests. 

For more advanced build options including integration tests for the target runtimes please see the ["Building Emma" section in the Wiki](../../wiki/Building-Emma).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/emmalanguage/emma

Awesome Lists containing this project

README