Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jpmml/jpmml-evaluator-spark
PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)
https://github.com/jpmml/jpmml-evaluator-spark
Last synced: 3 months ago
JSON representation
PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)
- Host: GitHub
- URL: https://github.com/jpmml/jpmml-evaluator-spark
- Owner: jpmml
- License: agpl-3.0
- Created: 2015-11-29T10:03:37.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2022-04-02T11:16:55.000Z (over 2 years ago)
- Last Synced: 2024-05-19T03:00:46.617Z (6 months ago)
- Language: Java
- Size: 103 KB
- Stars: 94
- Watchers: 14
- Forks: 43
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
JPMML-Evaluator-Spark [![Build Status](https://github.com/jpmml/jpmml-evaluator-spark/workflows/maven/badge.svg)](https://github.com/jpmml/jpmml-evaluator-spark/actions?query=workflow%3A%22maven%22)
=====================PMML evaluator library for the Apache Spark cluster computing system (https://spark.apache.org/).
# Features #
* Full support for PMML specification versions 3.0 through 4.4. The evaluation is handled by the [JPMML-Evaluator](https://github.com/jpmml/jpmml-evaluator) library.
# Prerequisites #
* Apache Spark version 2.X or 3.X.
# Installation #
The JPMML-Evaluator-Spark library JAR file (together with accompanying Java source and Javadocs JAR files) is released via [Maven Central Repository](https://repo1.maven.org/maven2/org/jpmml/).
The current version is **1.3.0** (2 April, 2022).
```xml
org.jpmml
jpmml-evaluator-spark
1.3.0```
# Usage #
Building a generic transformer based on a PMML byte stream:
```java
InputStream pmmlIs = ...;EvaluatorBuilder evaluatorBuilder = new LoadingModelEvaluatorBuilder()
.setLocatable(false)
.load(pmmlIs);Evaluator evaluator = evaluatorBuilder.build();
// Performing a self-check (duplicates as a warm-up)
evaluator.verify();TransformerBuilder pmmlTransformerBuilder = new TransformerBuilder(evaluator)
.withTargetCols()
.withOutputCols()
.exploded(false);Transformer pmmlTransformer = pmmlTransformerBuilder.build();
```Building an Apache Spark ML-style regressor when the PMML document is known to contain a regression model (eg. auto-mpg dataset):
```java
TransformerBuilder pmmlTransformerBuilder = new TransformerBuilder(evaluator)
.withLabelCol("MPG") // Double column
.exploded(true);
```Building an Apache Spark ML-style classifier when the PMML document is known to contain a classification model (eg. iris-species dataset):
```java
TransformerBuilder pmmlTransformerBuilder = new TransformerBuilder(evaluator)
.withLabelCol("Species") // String column
.withProbabilityCol("Species_probability", Arrays.asList("setosa", "versicolor", "virginica")) // Vector column
.exploded(true);
```Scoring data:
```java
Dataset> inputDs = ...;Dataset> resultDs = pmmlTransformer.transform(inputDs);
```In default mode, the transformation appends an intermediary "pmml" column to the data frame, which contains all the requested result columns:
```
root
|-- Sepal_Length: double (nullable = true)
|-- Sepal_Width: double (nullable = true)
|-- Petal_Length: double (nullable = true)
|-- Petal_Width: double (nullable = true)
|-- pmml: struct (nullable = true)
| |-- Species: string (nullable = false)
| |-- Species_probability: vector (nullable = false)
```In exploded mode, the transformation appends all the requested result columns to the data frame:
```
root
|-- Sepal_Length: double (nullable = true)
|-- Sepal_Width: double (nullable = true)
|-- Petal_Length: double (nullable = true)
|-- Petal_Width: double (nullable = true)
|-- Species: string (nullable = false)
|-- Species_probability: vector (nullable = false)
```# License #
JPMML-Evaluator-Spark is dual-licensed under the [GNU Affero General Public License (AGPL) version 3.0](https://www.gnu.org/licenses/agpl-3.0.html), and a commercial license.
# Additional information #
JPMML-Evaluator-Spark is developed and maintained by Openscoring Ltd, Estonia.
Interested in using JPMML software in your application? Please contact [[email protected]](mailto:[email protected])