Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/annettaqi/spam-detection

Using SGD to classify emails into spam or ham
https://github.com/annettaqi/spam-detection

spark stochastic-gradient-descent

Last synced: 16 days ago
JSON representation

Using SGD to classify emails into spam or ham

Awesome Lists containing this project

README

        

The repository is about:

Using Python and Spark to perform spam detection.

The project involves two tasks:

The first is to build spam prediction models, using training data sets and stochastic gradient descent (SGD). The second is to use these models to predict whether the documents in a test data set are spam.
The stochastic gradient descent technique that you will be using is based on [a paper](http://arxiv.org/abs/1004.5168) by Cormack, Smucker and Clarke.