Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xd-deng/spark-ml-intro
PySpark Machine Learning Examples
https://github.com/xd-deng/spark-ml-intro
machine-learning spark
Last synced: 3 months ago
JSON representation
PySpark Machine Learning Examples
- Host: GitHub
- URL: https://github.com/xd-deng/spark-ml-intro
- Owner: XD-DENG
- Created: 2015-12-21T15:32:57.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2018-03-08T09:21:56.000Z (almost 7 years ago)
- Last Synced: 2024-10-04T18:23:45.344Z (3 months ago)
- Topics: machine-learning, spark
- Size: 135 KB
- Stars: 44
- Watchers: 8
- Forks: 29
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spark Machine Learning Introduction
**NOTE: the methods introduced here are all based on RDD-based API. As of Spark 2.0, the RDD-based APIs in the `spark.mllib` package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame-based API in the `spark.ml` package. I would strongly suggest NOT use this repo for your learning anymore (please refer to https://spark.apache.org/docs/2.1.0/ml-guide.html).**
In this repo, I try to introduce some basic machine learning usages of *PySpark*. The contents I'm going to cover would be quite simple. But I guess it would be helpful for some people since I would cover some questions I encountered myself from the perspective of a person who's used to more "normal" ML settings (like R language).
For the basic PySpark operations (Tranformations and Actions), you may refer to my another GitHub repo, [Spark Practice](https://github.com/XD-DENG/Spark-practice).
Some of the examples are from the official examples given by Spark. But I will give more details.
- [Random Forest](https://github.com/XD-DENG/Spark-ML-Intro/tree/master/chapters/random_forest.md)
- [Regression](https://github.com/XD-DENG/Spark-ML-Intro/tree/master/chapters/regression.md)
- [K-means](https://github.com/XD-DENG/Spark-ML-Intro/tree/master/chapters/k_means.md)
- [References](https://github.com/XD-DENG/Spark-ML-Intro/tree/master/chapters/references.md)## License
Please note this repostory is under the Creative Commons Attribution-ShareAlike License[https://creativecommons.org/licenses/by-sa/3.0/].