https://github.com/rodneyshag/documentclassification
Spam classification and sentiment analysis on text documents.
https://github.com/rodneyshag/documentclassification
machine-learning naive-bayes-classifier sentiment-analysis spam-classification
Last synced: 3 months ago
JSON representation
Spam classification and sentiment analysis on text documents.
- Host: GitHub
- URL: https://github.com/rodneyshag/documentclassification
- Owner: RodneyShag
- Created: 2017-01-14T23:57:30.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-06-11T21:23:29.000Z (about 8 years ago)
- Last Synced: 2025-04-11T04:32:25.682Z (3 months ago)
- Topics: machine-learning, naive-bayes-classifier, sentiment-analysis, spam-classification
- Language: Java
- Homepage:
- Size: 2.13 MB
- Stars: 3
- Watchers: 1
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: news/8category.testing.txt
Awesome Lists containing this project
README
# DocumentClassification
Spam classification and sentiment analysis.## Goals:
1) **Spam Classification:** Partition emails into 2 categories depending on whether they contain spam or not.
2) **Sentiment Analysis:** Partition movie reviews into 2 categories depending on whether they are positive or negative reviews.## Classifiers:
1) [Multinomial Naive Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Multinomial_naive_Bayes)
2) [Bernoulli Naive Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Bernoulli_naive_Bayes)## Data Formats:
1) **emails:** text documents
2) **movie reviews:** text documents## Technique:
The goals above are each accomplished by training a Naive Bayes classifier on a set of training data, and then testing our classifier on a set of test data. We hope to have a high success rate in figuring out which emails contain spam, and whether an unseen movie review is positive or negative.