Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/annettaqi/spam-detection
Using SGD to classify emails into spam or ham
https://github.com/annettaqi/spam-detection
spark stochastic-gradient-descent
Last synced: 16 days ago
JSON representation
Using SGD to classify emails into spam or ham
- Host: GitHub
- URL: https://github.com/annettaqi/spam-detection
- Owner: AnnettaQi
- Created: 2024-10-17T04:32:35.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-10-17T04:50:19.000Z (3 months ago)
- Last Synced: 2024-11-01T21:08:15.561Z (2 months ago)
- Topics: spark, stochastic-gradient-descent
- Language: Jupyter Notebook
- Homepage:
- Size: 80.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
The repository is about:
Using Python and Spark to perform spam detection.
The project involves two tasks:
The first is to build spam prediction models, using training data sets and stochastic gradient descent (SGD). The second is to use these models to predict whether the documents in a test data set are spam.
The stochastic gradient descent technique that you will be using is based on [a paper](http://arxiv.org/abs/1004.5168) by Cormack, Smucker and Clarke.