An open API service indexing awesome lists of open source software.

https://github.com/michaelaquilina/spamfilter

Classification of emails using machine learning and natural language processing techniques in Java
https://github.com/michaelaquilina/spamfilter

Last synced: 7 months ago
JSON representation

Classification of emails using machine learning and natural language processing techniques in Java

Awesome Lists containing this project

README

          

Spam Filter
===========

Authors:
* [Michael Aquilina](https://github.com/KillaW0lf04)
* [Uwe L. Korn](https://github.com/xhochy)

Project for "Introduction to Machine Learning" course.

Requirements can be found in the following pages:
* [Spam Filter Implementation](https://www.cs.bris.ac.uk/Teaching/Resources/COMS30301/projects/spam/1/index.html)
* [Training and Classification](https://www.cs.bris.ac.uk/Teaching/Resources/COMS30301/projects/spam/2/index.html)

Results
-------

* Current terms selected from feature selection: https://gist.github.com/KillaW0lf04/d430834e07b4e7aa3901
* Json format for saving EmailClassifier: https://gist.github.com/KillaW0lf04/2ec98c3d2ad29085fcd1\
* Most representative features detected by the Naive Bayes classifier: https://gist.github.com/KillaW0lf04/ddb0871769b75d37b49e

Design
------

A [design wiki](https://github.com/KillaW0lf04/SpamFilter/wiki/Design) has been set up to establish a common understanding of components and milestones needed to complete the project. This should also help us when it comes to writing the report for hand in.

Current Machine Learning Models
-------------------------------

* Perceptron
* Naive Bayes
* Decision Tree

Third Party Code
----------------
* [Porter Stemmer Algorithm](http://tartarus.org/martin/PorterStemmer/) in [java](http://tartarus.org/martin/PorterStemmer/java.txt)
* AbstractAdapter is based on [this stackoverflow post](http://stackoverflow.com/questions/5800433/polymorphism-with-gson)