Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/xd-deng/spark-ml-intro

PySpark Machine Learning Examples
https://github.com/xd-deng/spark-ml-intro

machine-learning spark

Last synced: 3 months ago
JSON representation

PySpark Machine Learning Examples

Awesome Lists containing this project

README

        

# Spark Machine Learning Introduction

**NOTE: the methods introduced here are all based on RDD-based API. As of Spark 2.0, the RDD-based APIs in the `spark.mllib` package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame-based API in the `spark.ml` package. I would strongly suggest NOT use this repo for your learning anymore (please refer to https://spark.apache.org/docs/2.1.0/ml-guide.html).**

In this repo, I try to introduce some basic machine learning usages of *PySpark*. The contents I'm going to cover would be quite simple. But I guess it would be helpful for some people since I would cover some questions I encountered myself from the perspective of a person who's used to more "normal" ML settings (like R language).

For the basic PySpark operations (Tranformations and Actions), you may refer to my another GitHub repo, [Spark Practice](https://github.com/XD-DENG/Spark-practice).

Some of the examples are from the official examples given by Spark. But I will give more details.

- [Random Forest](https://github.com/XD-DENG/Spark-ML-Intro/tree/master/chapters/random_forest.md)
- [Regression](https://github.com/XD-DENG/Spark-ML-Intro/tree/master/chapters/regression.md)
- [K-means](https://github.com/XD-DENG/Spark-ML-Intro/tree/master/chapters/k_means.md)
- [References](https://github.com/XD-DENG/Spark-ML-Intro/tree/master/chapters/references.md)

## License
Please note this repostory is under the Creative Commons Attribution-ShareAlike License[https://creativecommons.org/licenses/by-sa/3.0/].