https://github.com/kingabzpro/malawi-news-classification
Using text classifier to predict various categories in Malawi News articles using SMOTE and SGDClassifier.
https://github.com/kingabzpro/malawi-news-classification
africa multiclass-classification nlp-machine-learning oversampling
Last synced: about 2 months ago
JSON representation
Using text classifier to predict various categories in Malawi News articles using SMOTE and SGDClassifier.
- Host: GitHub
- URL: https://github.com/kingabzpro/malawi-news-classification
- Owner: kingabzpro
- License: apache-2.0
- Created: 2021-09-03T16:27:23.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-09-03T16:44:14.000Z (over 3 years ago)
- Last Synced: 2025-01-17T22:12:03.492Z (4 months ago)
- Topics: africa, multiclass-classification, nlp-machine-learning, oversampling
- Language: Jupyter Notebook
- Homepage: https://pub.towardsai.net/malawi-news-classification-an-nlp-project-adfa867abfd9
- Size: 1.77 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Malawi-News-Classification
Using text classifier to predict various categories in Malawi News articles using SMOTE and SGDClassifier.[](https://deepnote.com/viewer/github/kingabzpro/Malawi-News-Classification/blob/main/malawi-news-classification.ipynb)

The project code is simple and effective on competitive grounds. I have experimented with Vectorizer, Porter stemmer for test preprocessing. I have also used multiple methods to clean my text to improved overall model performance. In the end, I have used SKlearn Stochastic Gradient Decent (SGD) classifier for predicting News categories. I have also experimented with various neural networks and gradient boosting models, but they all failed as simple logistics regression with minimum hyperparameter tunning works quite well on this data.
> To understand the code read my article on [Medium](https://pub.towardsai.net/malawi-news-classification-an-nlp-project-adfa867abfd9)