Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ma-shamshiri/Spam-Detector
This project presents a Python-based spam detector program that utilizes the Naive Bayes approach to classify emails as either spam or ham. The system is designed to accurately and efficiently identify spam messages, providing a useful tool for individuals and organizations seeking to manage their email inboxes more effectively.
https://github.com/ma-shamshiri/Spam-Detector
anaconda jupyter-notebook python
Last synced: 3 months ago
JSON representation
This project presents a Python-based spam detector program that utilizes the Naive Bayes approach to classify emails as either spam or ham. The system is designed to accurately and efficiently identify spam messages, providing a useful tool for individuals and organizations seeking to manage their email inboxes more effectively.
- Host: GitHub
- URL: https://github.com/ma-shamshiri/Spam-Detector
- Owner: ma-shamshiri
- Created: 2020-05-13T19:46:57.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-08-21T02:54:04.000Z (over 2 years ago)
- Last Synced: 2024-01-25T07:07:04.265Z (12 months ago)
- Topics: anaconda, jupyter-notebook, python
- Language: Python
- Homepage:
- Size: 4.8 MB
- Stars: 22
- Watchers: 1
- Forks: 97
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- fucking-awesome-readme - ma-shamshiri/Spam-Detector - Complete project file description. Project logo. Animated project banner. Concise project description. Clear execution instruction. (Examples)
- awesome-readme - ma-shamshiri/Spam-Detector - Complete project file description. Project logo. Animated project banner. Concise project description. Clear execution instruction. (Examples)
- awesome-readme - ma-shamshiri/Spam-Detector - Complete project file description. Project logo. Animated project banner. Concise project description. Clear execution instruction. (Examples)
README
Spam Detector
COMP 6721 - Artificial Intelligence
Project Assignment 2 - Concordia University (Winter 2020)
I have developed a spam detector program in Python which classifies given emails as spam or ham using the Naive Bayes approach.
:floppy_disk: Project Files Description
This Project includes 3 executable files, 3 text files as well as 2 directories as follows:
Executable Files:
-
spam_detector.py - Includes all functions required for classification operations. -
train.py - Uses the functions defined in the spam_detector.py file and generates the model.txt file after execution. -
test.py - Uses the functions defined in the spam_detector.py file and, after execution, generates the result.txt as well as evaluation.txt files.
Output Files:
-
model.txt - Contains information about the vocabularies of the train set, such as the frequency and conditional probability of each word in Spam and Ham classes. -
result.txt - Contains information about the classified emails of the test set. -
evaluation.txt - Contains evaluation results table as well as Confusion Matrix of Spam and Ham classes.
Source Directories:
-
train directory - Includes all emails for the training phase of the program. -
test directory - Includes all emails for the testing phase of the program.
![-----------------------------------------------------](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/rainbow.png)
:book: Naive Bayes
In machine learning, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features.
Abstractly, naive Bayes is a conditional probability model: given a problem instance to be classified, represented by a vector
representing some n features (independent variables), it assigns to this instance probabilities
The problem with the above formulation is that if the number of features n is large or if a feature can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable. Using Bayes' theorem, the conditional probability can be decomposed as
![-----------------------------------------------------](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/rainbow.png)
:clipboard: Execution Instruction
The order of execution of the program files is as follows:
1) spam_detector.py
First, the spam_detector.py file must be executed to define all the functions and variables required for classification operations.
2) train.py
Then, the train.py file must be executed, which leads to the production of the model.txt file.
At the beginning of this file, the spam_detector has been imported so that the functions defined in it can be used.
3) test.py
Finally, the test.py file must be executed to create the result.txt and evaluation.txt files.
Just like the train.py file, at the beginning of this file, the spam_detector has been imported so that the functions defined in it can be used.
![-----------------------------------------------------](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/rainbow.png)
:books: References
-
Jonathan Lee, 'Notes on Naive Bayes Classifiers for Spam Filtering'. [Online].
Available: https://courses.cs.washington.edu/courses/cse312/18sp/lectures/naive-bayes/naivebayesnotes.pdf
-
Wikipedia.org, 'Naive Bayes Classifier'. [Online].
Available: https://en.wikipedia.org/wiki/Naive_Bayes_classifier
-
Youtube.com, 'Naive Bayes for Spam Detection'. [Online].
Available: https://www.youtube.com/watch?v=8aZNAmWKGfs
-
Youtube.com, 'Text Classification Using Naive Bayes'. [Online].
Available: https://www.youtube.com/watch?v=EGKeC2S44Rs
-
Manisha-sirsat.blogspot.com, 'What is Confusion Matrix and Advanced Classification Metrics?'. [Online].
Available: https://manisha-sirsat.blogspot.com/2019/04/confusion-matrix.html
-
Pythonforengineers.com, 'Build a Spam Filter'. [Online].
Available: https://www.pythonforengineers.com/build-a-spam-filter/
![-----------------------------------------------------](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/rainbow.png)
:scroll: Credits
Mohammad Amin Shamshiri
[![GitHub Badge](https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/ma-shamshiri)
[![Twitter Badge](https://img.shields.io/badge/Twitter-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white)](https://twitter.com/ma_shamshiri)
[![LinkedIn Badge](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/ma-shamshiri)