https://github.com/aahouzi/simple_document_classification_from_scratch
Simple Document Classification using Multi Class Logistic Regression & SVM Soft Margin from scratch
https://github.com/aahouzi/simple_document_classification_from_scratch
logistic-regression momentum-optimization-algorithm natural-language-processing optimization python svm-classifier
Last synced: about 2 months ago
JSON representation
Simple Document Classification using Multi Class Logistic Regression & SVM Soft Margin from scratch
- Host: GitHub
- URL: https://github.com/aahouzi/simple_document_classification_from_scratch
- Owner: aahouzi
- License: mit
- Created: 2020-12-01T20:25:03.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-07-30T12:23:20.000Z (almost 4 years ago)
- Last Synced: 2025-06-29T18:50:28.506Z (12 months ago)
- Topics: logistic-regression, momentum-optimization-algorithm, natural-language-processing, optimization, python, svm-classifier
- Language: Jupyter Notebook
- Homepage:
- Size: 80.1 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Simple Document Classification using Multi Class Logistic Regression & SVM Soft Margin from scratch
## :monocle_face: Description
This mini-project contains an implementation from scratch of some Multi-Class Classification Algorithms. The data is already cleaned, and doesn't need any further pre-processing, it was encoded using Tf-idf (Term Frequency Inverse Document Frequency).
**Firstly**, I implemented the Logistic Regression algorithm with One-vs-All strategy to adapt the algorithm for the multi-classification task. I used the Momentum with SGD optimizer for optimizing the Binary Cross-Entropy loss used to get the optimal weight matrix.
**Secondly**, I implemented Multi-Class SVM with the same strategy as Logistic Regression, and since I chose soft margin SVM to deal with non-linearly separable data, I used the same previous optimizer to optimize the L2 reguralized Hinge loss.
## :rocket: Repository Structure
The repository contains the following files & directories:
- **Notebooks directory:** It contains a jupyter notebook where the main functions of the project are called, and where results are displayed.
- **Loss directory:** It contains an implementation of the various loss functions mentioned in the description, and their corresponding gradient calculus.
- **Algorithms directory:** This directory contains an implementation of the various ML algorithms mentioned in the description.
- **The dataset:** This mini-project was taken from a HackerRank challenge, you can refer to [the following link](https://www.hackerrank.com/challenges/document-classification/problem), to get the dataset as well as the instructions to solve the problem.
## :bulb: Next steps
Implementation of SVM with the Kernel trick, to deal with non-linearly separable data using various Kernel functions (Gaussian, Polynomial, etc..).
## :mailbox_closed: Contact
For any information, feedback or questions, please [contact me][anas-email]
[anas-email]: mailto:ahouzi2000@hotmail.fr