https://github.com/jgurakuqi/slt-based-spam-filters
This project aims to develop three spam filters using different Machine Learning techniques. The techniques to be used are Support Vector Machines with Linear, Polynomial and Radial Basis Function kernels (together with their angular versions), Naive Bayes and K-Nearest Neighbours.
https://github.com/jgurakuqi/slt-based-spam-filters
k-nearest-neighbours kernel-methods machine-learning naive-bayes support-vector-machines
Last synced: 6 months ago
JSON representation
This project aims to develop three spam filters using different Machine Learning techniques. The techniques to be used are Support Vector Machines with Linear, Polynomial and Radial Basis Function kernels (together with their angular versions), Naive Bayes and K-Nearest Neighbours.
- Host: GitHub
- URL: https://github.com/jgurakuqi/slt-based-spam-filters
- Owner: jgurakuqi
- License: mit
- Created: 2022-03-31T19:59:26.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-08-03T13:39:24.000Z (about 2 years ago)
- Last Synced: 2025-02-02T16:57:42.675Z (8 months ago)
- Topics: k-nearest-neighbours, kernel-methods, machine-learning, naive-bayes, support-vector-machines
- Language: Jupyter Notebook
- Homepage:
- Size: 827 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# slt-based-spam-filters
## REQUIREMENTS:
Write a spam filter using discrimitative and generative classifiers. Use the Spambase dataset which already
represents spam/ham messages through a bag-of-words representations through a dictionary of 48 highly
discriminative words and 6 characters. The first 54 features correspond to word/symbols frequencies; ignore
features 55-57; feature 58 is the class label (1 spam/0 ham).Perform SVM classification using linear, polynomial of degree 2, and RBF kernels over the TF/IDF representation.
Can you transform the kernels to make use of angular information only (i.e., no length)? Are they still positive
definite kernels? Classify the same data also through a Naive Bayes classifier for continuous inputs, modelling
each feature with a Gaussian distribution, resulting in the following model:where αk is the frequency of class k, and μki, σ2ki are the means and variances of feature i given that the data
is in class k.
Perform k-NN clasification with k=5
Provide the code, the models on the training set, and the respective performances in 10 way cross validation.Explain the differences between the three models.
P.S. you can use a library implementation for SVM, but do implement the Naive Bayes on your own. As for k-NN,
you can use libraries if you want, but it might just be easier to do it on your own.## OUR NOTES:
The jupyter notebook file is the file used to display the results from all the algorithms applied to
the spam/ham dataset.For the Naive Bayes it was required to implement it without making use of ML libraries, so we opted for
a C++ implementation to gain extra performance.Before of running the jupyter notebook file named "main.ipynb", if the naive_bayes_main.o, lib_naive_.so and
naivelib.so files are misssing, there is need to compile the c++ file with the following commands, in order
to export it as a callable library for Python:g++ -c -fPIC naive_bayes_main.cpp -O3 -Wextra -Wall -o naive_bayes_main.o
g++ -shared -Wl,-soname,lib_naive.so -O3 -o lib_naive_.so naive_bayes_main.oIf the user wishes to run the naive_bayes_main.cpp file directly, than it's advised to compile it with the
following command:g++ naive_bayes_main.cpp -O3 -o out
and then run it through the following command (if you whish to run it from a bash terminal):
./out
## LICENSE
MIT License
Copyright (c) 2022 Jurgen Gurakuqi, Elsa Sejdi
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.