https://github.com/machinezone/sparkml-par

Last synced: 11 months ago
JSON representation

Host: GitHub
URL: https://github.com/machinezone/sparkml-par
Owner: machinezone
License: bsd-3-clause
Created: 2018-07-03T01:10:24.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2020-10-13T08:54:10.000Z (over 5 years ago)
Last Synced: 2025-02-02T20:28:18.707Z (about 1 year ago)
Language: Scala
Size: 23.4 KB
Stars: 3
Watchers: 3
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          
# SparkML-Par

Parallel implementation of SparkML transformers and estimators.

# Motivation

This library extends SparkML to allow for parallel transformation of input datasets.

That is to transform multiple columns in parallel using the same set of transformations

one would normally need to apply in sequence.

# Development

Clone this repository and run `mvn clean test`

To build for a custom version of Spark/Scala, run 

`mvn clean compile \

-Dscala.major.version= \

-Dscala.minor.version= \

-Dspark.version=`

e.g. 

```bash

mvn clean package \

-Dscala.major.version=2.11 \

-Dscala.minor.version=2.11.8 \

-Dspark.version=2.3.0

```

## build profiles

Alternatively one can build against a limited number of pre-defined profiles.

See the [pom](pom.xml) for a list of the profiles.

Example build with profiles: 

`mvn clean package -Pspark_2.3,scala_2.11`

`mvn clean package -Pspark_2.0,scala_2.10`

# Support

Here is a handy table of supported build version combinations:

| Apache Spark | Scala |

|:------------:|:-----:|

| 2.0.x        | 2.10  |

| 2.0.x        | 2.11  | 

| 2.1.x        | 2.10  |

| 2.1.x        | 2.11  |

| 2.2.x        | 2.10  |

| 2.2.x        | 2.11  |

| 2.3.x        | 2.11  |

# License

see the [license](LICENSE) for license information.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/machinezone/sparkml-par

Awesome Lists containing this project

README