{"id":13488755,"url":"https://github.com/combust/mleap","last_synced_at":"2026-07-10T19:01:07.647Z","repository":{"id":11389115,"uuid":"66331305","full_name":"combust/mleap","owner":"combust","description":"MLeap: Deploy ML Pipelines to Production","archived":false,"fork":false,"pushed_at":"2026-03-10T14:17:41.000Z","size":3585,"stargazers_count":1536,"open_issues_count":110,"forks_count":316,"subscribers_count":64,"default_branch":"master","last_synced_at":"2026-05-07T16:42:55.638Z","etag":null,"topics":["data-pipelines","python","scala","scikit-learn","spark","tensorflow","transformers"],"latest_commit_sha":null,"homepage":"https://combust.github.io/mleap-docs/","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/combust.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2016-08-23T03:51:03.000Z","updated_at":"2026-05-07T00:32:38.000Z","dependencies_parsed_at":"2023-09-27T07:40:39.167Z","dependency_job_id":"ee543ac8-e9d7-43f2-aa54-80402484191b","html_url":"https://github.com/combust/mleap","commit_stats":{"total_commits":835,"total_committers":84,"mean_commits":9.94047619047619,"dds":0.6730538922155689,"last_synced_commit":"43993e1237284ec19129424be79fb48048f4c181"},"previous_names":["combust-ml/mleap"],"tags_count":39,"template":false,"template_full_name":null,"purl":"pkg:github/combust/mleap","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/combust%2Fmleap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/combust%2Fmleap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/combust%2Fmleap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/combust%2Fmleap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/combust","download_url":"https://codeload.github.com/combust/mleap/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/combust%2Fmleap/sbom","scorecard":{"id":300608,"data":{"date":"2025-08-11","repo":{"name":"github.com/combust/mleap","commit":"2a27dc3a1662e3f30be0dc936f15cc07041e0869"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.9,"checks":[{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":8,"reason":"Found 11/13 approved changesets -- score normalized to 8","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: containerImage not pinned by hash: .devcontainer/Dockerfile:1: pin your Docker image by updating ubuntu:22.04 to ubuntu:22.04@sha256:1aa979d85661c488ce030ac292876cf6ed04535d3a237e49f61542d8e5de5ae0","Warn: downloadThenRun not pinned by hash: .devcontainer/Dockerfile:39-42","Info:   0 out of   1 containerImage dependencies pinned","Info:   0 out of   1 downloadThenRun dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":0,"reason":"14 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2023-44 / GHSA-329j-jfvr-rhr6","Warn: Project is vulnerable to: GHSA-43xg-8wmj-cw8h","Warn: Project is vulnerable to: PYSEC-2021-856 / GHSA-5545-2q6w-2gh6","Warn: Project is vulnerable to: PYSEC-2019-108 / GHSA-9fq2-x9r6-wfmf","Warn: Project is vulnerable to: PYSEC-2021-857 / GHSA-f7c7-j99h-c22f","Warn: Project is vulnerable to: GHSA-fpfv-jqm9-f5jm","Warn: Project is vulnerable to: PYSEC-2017-1 / GHSA-frgw-fgh6-9g52","Warn: Project is vulnerable to: PYSEC-2024-110 / GHSA-jw8x-6495-233v","Warn: Project is vulnerable to: GHSA-jxfp-4rvq-9h9m","Warn: Project is vulnerable to: PYSEC-2023-102","Warn: Project is vulnerable to: PYSEC-2023-114","Warn: Project is vulnerable to: GHSA-34jh-p97f-mpxf","Warn: Project is vulnerable to: PYSEC-2023-212 / GHSA-g4mx-q9vg-27p4","Warn: Project is vulnerable to: GHSA-pq67-6m6q-mj2v"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 28 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-17T20:31:17.877Z","repository_id":11389115,"created_at":"2025-08-17T20:31:17.877Z","updated_at":"2025-08-17T20:31:17.877Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35339931,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-10T02:00:06.465Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-pipelines","python","scala","scikit-learn","spark","tensorflow","transformers"],"created_at":"2024-07-31T18:01:21.275Z","updated_at":"2026-07-10T19:01:07.632Z","avatar_url":"https://github.com/combust.png","language":"Scala","funding_links":[],"categories":["Scala","Model Training Orchestration","Model Deployment and Orchestration Frameworks","Deep Learning Framework","Main Contents","Model Training and Orchestration","Software","人工智能","Packages","Data Pipelines \u0026 Streaming"],"sub_categories":["Deployment \u0026 Distribution","Model Deployment Frameworks","Deploying models","Machine Learning Extension"],"readme":"\u003ca href=\"https://combust.github.io/mleap-docs/\"\u003e\u003cimg src=\"logo.png\" alt=\"MLeap Logo\" width=\"176\" height=\"70\" /\u003e\u003c/a\u003e\n\n[![Gitter](https://badges.gitter.im/combust/mleap.svg)](https://gitter.im/combust/mleap?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge)\n[![Build Status](https://travis-ci.org/combust/mleap.svg?branch=master)](https://travis-ci.org/combust/mleap)\n[![Maven Central](https://maven-badges.herokuapp.com/maven-central/ml.combust.mleap/mleap-base_2.12/badge.svg)](https://maven-badges.herokuapp.com/maven-central/ml.combust.mleap/mleap-base_2.12)\n\nDeploying machine learning data pipelines and algorithms should not be a time-consuming or difficult task. MLeap allows data scientists and engineers to deploy machine learning pipelines from Spark and Scikit-learn to a portable format and execution engine.\n\n## Documentation\n\nDocumentation is available at [https://combust.github.io/mleap-docs/](https://combust.github.io/mleap-docs/).\n\nRead [Serializing a Spark ML Pipeline and Scoring with MLeap](https://github.com/combust-ml/mleap/wiki/Serializing-a-Spark-ML-Pipeline-and-Scoring-with-MLeap) to gain a full sense of what is possible.\n\n## Introduction\n\nUsing the MLeap execution engine and serialization format, we provide a performant, portable and easy-to-integrate production library for machine learning data pipelines and algorithms.\n\nFor portability, we build our software on the JVM and only use serialization formats that are widely-adopted.\n\nWe also provide a high level of integration with existing technologies.\n\nOur goals for this project are:\n\n1. Allow Researchers/Data Scientists and Engineers to continue to build data pipelines and train algorithms with Spark and Scikit-Learn\n2. Extend Spark/Scikit/TensorFlow by providing ML Pipelines serialization/deserialization to/from a common framework (Bundle.ML)\n3. Use MLeap Runtime to execute your pipeline and algorithm without dependenices on Spark or Scikit (numpy, pandas, etc)\n\n## Overview\n\n1. Core execution engine implemented in Scala\n2. [Spark](http://spark.apache.org/), PySpark and Scikit-Learn support\n3. Export a model with Scikit-learn or Spark and execute it using the MLeap Runtime (without dependencies on the Spark Context, or sklearn/numpy/pandas/etc)\n4. Choose from 2 portable serialization formats (JSON, Protobuf)\n5. Implement your own custom data types and transformers for use with MLeap data frames and transformer pipelines\n6. Extensive test coverage with full parity tests for Spark and MLeap pipelines\n7. Optional Spark transformer extension to extend Spark's default transformer offerings\n\n\u003cimg src=\"assets/images/single-runtime.jpg\" alt=\"Unified Runtime\"/\u003e\n\n## Dependency Compatibility Matrix\n\nOther versions besides those listed below may also work (especially more recent Java versions for the JRE), \nbut these are the configurations which are tested by mleap.\n\n| MLeap Version | Spark Version | Scala Version    | Java Version | Python Version | XGBoost Version | Tensorflow Version |\n|---------------|---------------|------------------|--------------|----------------|-----------------|--------------------|\n| 0.25.0        | 4.1.0         | 2.13.17          | 17           | 3.10 - 3.13    | 2.0.3           | 2.10.1             |\n| 0.24.2        | 4.0.3         | 2.13.16          | 17           | 3.9 - 3.13     | 2.0.3           | 2.10.1             |\n| 0.24.1        | 4.0.2         | 2.13.16          | 17           | 3.9 - 3.13     | 2.0.3           | 2.10.1             |\n| 0.24.0        | 4.0.1         | 2.13.16          | 17           | 3.9 - 3.13     | 2.0.3           | 2.10.1             |\n| 0.23.4        | 3.4.4         | 2.12.18          | 11           | 3.7 - 3.12     | 1.7.6           | 2.10.1             |\n| 0.23.3        | 3.4.0         | 2.12.18          | 11           | 3.7, 3.8       | 1.7.6           | 2.10.1             |\n| 0.23.2        | 3.4.0         | 2.12.18          | 11           | 3.7, 3.8       | 1.7.6           | 2.10.1             |\n| 0.23.1        | 3.4.0         | 2.12.18          | 11           | 3.7, 3.8       | 1.7.6           | 2.10.1             |\n| 0.23.0        | 3.4.0         | 2.12.13          | 11           | 3.7, 3.8       | 1.7.3           | 2.10.1             |\n| 0.22.0        | 3.3.0         | 2.12.13          | 11           | 3.7, 3.8       | 1.6.1           | 2.7.0              |\n| 0.21.1        | 3.2.0         | 2.12.13          | 11           | 3.7            | 1.6.1           | 2.7.0              |\n| 0.21.0        | 3.2.0         | 2.12.13          | 11           | 3.6, 3.7       | 1.6.1           | 2.7.0              |\n| 0.20.0        | 3.2.0         | 2.12.13          | 8            | 3.6, 3.7       | 1.5.2           | 2.7.0              |\n| 0.19.0        | 3.0.2         | 2.12.13          | 8            | 3.6, 3.7       | 1.3.1           | 2.4.1              |\n| 0.18.1        | 3.0.2         | 2.12.13          | 8            | 3.6, 3.7       | 1.0.0           | 2.4.1              |\n| 0.18.0        | 3.0.2         | 2.12.13          | 8            | 3.6, 3.7       | 1.0.0           | 2.4.1              |\n| 0.17.0        | 2.4.5         | 2.11.12, 2.12.10 | 8            | 3.6, 3.7       | 1.0.0           | 1.11.0             |\n\n## Setup\n\n### Link with Maven or SBT\n\n#### SBT\n\n```sbt\nlibraryDependencies += \"ml.combust.mleap\" %% \"mleap-runtime\" % \"0.24.0\"\n```\n\n#### Maven\n\n```pom\n\u003cdependency\u003e\n    \u003cgroupId\u003eml.combust.mleap\u003c/groupId\u003e\n    \u003cartifactId\u003emleap-runtime_2.13\u003c/artifactId\u003e\n    \u003cversion\u003e0.24.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n### For Spark Integration\n\n#### SBT\n\n```sbt\nlibraryDependencies += \"ml.combust.mleap\" %% \"mleap-spark\" % \"0.24.0\"\n```\n\n#### Maven\n\n```pom\n\u003cdependency\u003e\n    \u003cgroupId\u003eml.combust.mleap\u003c/groupId\u003e\n    \u003cartifactId\u003emleap-spark_2.13\u003c/artifactId\u003e\n    \u003cversion\u003e0.24.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n### PySpark Integration\n\nInstall MLeap from [PyPI](https://pypi.org/project/mleap/)\n```bash\n$ pip install mleap\n```\n\n## Using the Library\n\nFor more complete examples, see our other Git repository: [MLeap Demos](https://github.com/combust/mleap-demo)\n\n### Create and Export a Spark Pipeline\n\nThe first step is to create our pipeline in Spark. For our example we will manually build a simple Spark ML pipeline.\n\n\n```scala\nimport ml.combust.bundle.BundleFile\nimport ml.combust.mleap.spark.SparkSupport._\nimport org.apache.spark.ml.Pipeline\nimport org.apache.spark.ml.bundle.SparkBundleContext\nimport org.apache.spark.ml.feature.{Binarizer, StringIndexer}\nimport org.apache.spark.sql._\nimport org.apache.spark.sql.functions._\nimport scala.util.Using\n\n  val datasetName = \"./examples/spark-demo.csv\"\n\n  val dataframe: DataFrame = spark.sqlContext.read.format(\"csv\")\n    .option(\"header\", true)\n    .load(datasetName)\n    .withColumn(\"test_double\", col(\"test_double\").cast(\"double\"))\n\n  // User out-of-the-box Spark transformers like you normally would\n  val stringIndexer = new StringIndexer().\n    setInputCol(\"test_string\").\n    setOutputCol(\"test_index\")\n\n  val binarizer = new Binarizer().\n    setThreshold(0.5).\n    setInputCol(\"test_double\").\n    setOutputCol(\"test_bin\")\n\n  val pipelineEstimator = new Pipeline()\n    .setStages(Array(stringIndexer, binarizer))\n\n  val pipeline = pipelineEstimator.fit(dataframe)\n\n  // then serialize pipeline\n  val sbc = SparkBundleContext().withDataset(pipeline.transform(dataframe))\n  Using(BundleFile(\"jar:file:/tmp/simple-spark-pipeline.zip\")) { bf =\u003e\n    pipeline.writeBundle.save(bf)(sbc).get\n  }\n```\n\nThe dataset used for training can be found [here](https://github.com/combust/mleap/tree/master/examples/spark-demo.csv)\n\nSpark pipelines are not meant to be run outside of Spark. They require a DataFrame and therefore a SparkContext to run. These are expensive data structures and libraries to include in a project. With MLeap, there is no dependency on Spark to execute a pipeline. MLeap dependencies are lightweight and we use fast data structures to execute your ML pipelines.\n\n### PySpark Integration\n\nImport the MLeap library in your PySpark job\n\n```python\nimport mleap.pyspark\nfrom mleap.pyspark.spark_support import SimpleSparkSerializer\n```\n\nSee [PySpark Integration of python/README.md](python/README.md#pyspark-integration) for more.\n\n### Create and Export a Scikit-Learn Pipeline\n\n```python\nimport pandas as pd\n\nfrom mleap.sklearn.pipeline import Pipeline\nfrom mleap.sklearn.preprocessing.data import FeatureExtractor, LabelEncoder, ReshapeArrayToN1\nfrom sklearn.preprocessing import OneHotEncoder\n\ndata = pd.DataFrame(['a', 'b', 'c'], columns=['col_a'])\n\ncategorical_features = ['col_a']\n\nfeature_extractor_tf = FeatureExtractor(input_scalars=categorical_features, \n                                         output_vector='imputed_features', \n                                         output_vector_items=categorical_features)\n\n# Label Encoder for x1 Label \nlabel_encoder_tf = LabelEncoder(input_features=feature_extractor_tf.output_vector_items,\n                               output_features='{}_label_le'.format(categorical_features[0]))\n\n# Reshape the output of the LabelEncoder to N-by-1 array\nreshape_le_tf = ReshapeArrayToN1()\n\n# Vector Assembler for x1 One Hot Encoder\none_hot_encoder_tf = OneHotEncoder(sparse=False)\none_hot_encoder_tf.mlinit(prior_tf = label_encoder_tf, \n                          output_features = '{}_label_one_hot_encoded'.format(categorical_features[0]))\n\none_hot_encoder_pipeline_x0 = Pipeline([\n                                         (feature_extractor_tf.name, feature_extractor_tf),\n                                         (label_encoder_tf.name, label_encoder_tf),\n                                         (reshape_le_tf.name, reshape_le_tf),\n                                         (one_hot_encoder_tf.name, one_hot_encoder_tf)\n                                        ])\n\none_hot_encoder_pipeline_x0.mlinit()\none_hot_encoder_pipeline_x0.fit_transform(data)\none_hot_encoder_pipeline_x0.serialize_to_bundle('/tmp', 'mleap-scikit-test-pipeline', init=True)\n\n# array([[ 1.,  0.,  0.],\n#        [ 0.,  1.,  0.],\n#        [ 0.,  0.,  1.]])\n```\n\n### Load and Transform Using MLeap\n\nBecause we export Spark and Scikit-learn pipelines to a standard format, we can use either our Spark-trained pipeline or our Scikit-learn pipeline from the previous steps to demonstrate usage of MLeap in this section. The choice is yours!\n\n```scala\nimport ml.combust.bundle.BundleFile\nimport ml.combust.mleap.runtime.MleapSupport._\nimport scala.util.Using\n// load the Spark pipeline we saved in the previous section\nval bundle = Using(BundleFile(\"jar:file:/tmp/simple-spark-pipeline.zip\"))) { bundleFile =\u003e\n  bundleFile.loadMleapBundle().get\n}).opt.get\n\n// create a simple LeapFrame to transform\nimport ml.combust.mleap.runtime.frame.{DefaultLeapFrame, Row}\nimport ml.combust.mleap.core.types._\n\n// MLeap makes extensive use of monadic types like Try\nval schema = StructType(StructField(\"test_string\", ScalarType.String),\n  StructField(\"test_double\", ScalarType.Double)).get\nval data = Seq(Row(\"hello\", 0.6), Row(\"MLeap\", 0.2))\nval frame = DefaultLeapFrame(schema, data)\n\n// transform the dataframe using our pipeline\nval mleapPipeline = bundle.root\nval frame2 = mleapPipeline.transform(frame).get\nval data2 = frame2.dataset\n\n// get data from the transformed rows and make some assertions\nassert(data2(0).getDouble(2) == 1.0) // string indexer output\nassert(data2(0).getDouble(3) == 1.0) // binarizer output\n\n// the second row\nassert(data2(1).getDouble(2) == 2.0)\nassert(data2(1).getDouble(3) == 0.0)\n```\n\n## Documentation\n\nFor more documentation, please see our [documentation](https://combust.github.io/mleap-docs/), where you can learn to:\n\n1. Implement custom transformers that will work with Spark, MLeap and Scikit-learn\n2. Implement custom data types to transform with Spark and MLeap pipelines\n3. Transform with blazing fast speeds using optimized row-based transformers\n4. Serialize MLeap data frames to various formats like avro, json, and a custom binary format\n5. Implement new serialization formats for MLeap data frames\n6. Work through several demonstration pipelines which use real-world data to create predictive pipelines\n7. Supported Spark transformers\n8. Supported Scikit-learn transformers\n9. Custom transformers provided by MLeap\n\n## Contributing\n\n* Write documentation.\n* Write a tutorial/walkthrough for an interesting ML problem\n* Contribute an Estimator/Transformer from Spark\n* Use MLeap at your company and tell us what you think\n* Make a feature request or report a bug in github\n* Make a pull request for an existing feature request or bug report\n* Join the discussion of how to get MLeap into Spark as a dependency. Talk with us on Gitter (see link at top of README.md)\n\n## Building\n\nPlease ensure you have sbt 1.9.3, java 11, scala 2.12.18\n\n1. Initialize the git submodules `git submodule update --init --recursive`\n2. Run `sbt compile`\n\n## Thank You\n\nThank you to [Swoop](https://www.swoop.com/) for supporting the XGboost\nintegration.\n\n## Contributors Information\n\n* Jason Sleight ([jsleight](https://github.com/jsleight))\n* Talal Riaz ([talalryz](https://github.com/talalryz))\n* Weichen Xu ([WeichenXu123](https://github.com/WeichenXu123))\n\n## Past contributors\n* Hollin Wilkins (hollin@combust.ml)\n* Mikhail Semeniuk (mikhail@combust.ml)\n* Anca Sarb (sarb.anca@gmail.com)\n* Ryan Vogan (rvogan@yelp.com)\n\n\n## License\n\nSee LICENSE and NOTICE file in this repository.\n\nCopyright 20 Combust, Inc.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcombust%2Fmleap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcombust%2Fmleap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcombust%2Fmleap/lists"}