https://github.com/bernhard-42/20newsgroups-spark
https://github.com/bernhard-42/20newsgroups-spark
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/bernhard-42/20newsgroups-spark
- Owner: bernhard-42
- Created: 2016-01-22T16:28:01.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-02-18T15:51:33.000Z (over 9 years ago)
- Last Synced: 2025-04-07T18:02:28.488Z (6 months ago)
- Size: 30.3 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
The idea is to replicate the ["Working with Text Data" tutorial](http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html) of `scikit-learn` with Spark.
It shows the following approaches
- Multinomial Logistic Regression with MLLib (Scala)
- Naive Bayes Classification with Spark ML Pipeline (Scala)
- Naive Bayes Classification with Spark ML Pipeline (Python)The code can be seen [here](./Code.md) (which is converted from `2B8GZHTDD/note.json` via `tools/convert_note2md.py`)