An open API service indexing awesome lists of open source software.

https://github.com/bernhard-42/20newsgroups-spark


https://github.com/bernhard-42/20newsgroups-spark

Last synced: 4 months ago
JSON representation

Awesome Lists containing this project

README

          

The idea is to replicate the ["Working with Text Data" tutorial](http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html) of `scikit-learn` with Spark.

It shows the following approaches

- Multinomial Logistic Regression with MLLib (Scala)
- Naive Bayes Classification with Spark ML Pipeline (Scala)
- Naive Bayes Classification with Spark ML Pipeline (Python)

The code can be seen [here](./Code.md) (which is converted from `2B8GZHTDD/note.json` via `tools/convert_note2md.py`)