Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/saidsef/ml-classifier
Classify news articles into different categories using Machine Learning
https://github.com/saidsef/ml-classifier
classification classify-news-articles data-visualization hacktoberfest machine-learning machine-learning-algorithms openfaas-function serverless-functions sklearn-classify
Last synced: 2 months ago
JSON representation
Classify news articles into different categories using Machine Learning
- Host: GitHub
- URL: https://github.com/saidsef/ml-classifier
- Owner: saidsef
- License: mit
- Created: 2018-06-28T17:02:35.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2024-09-08T07:27:21.000Z (5 months ago)
- Last Synced: 2024-09-08T21:57:46.705Z (5 months ago)
- Topics: classification, classify-news-articles, data-visualization, hacktoberfest, machine-learning, machine-learning-algorithms, openfaas-function, serverless-functions, sklearn-classify
- Language: Jupyter Notebook
- Homepage:
- Size: 218 MB
- Stars: 10
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Machine Learning - News Articles classification with sklearn [![CI](https://github.com/saidsef/ml-classifier/actions/workflows/ci.yml/badge.svg)](#deployment) [![Tagging](https://github.com/saidsef/ml-classifier/actions/workflows/tagging.yml/badge.svg)](#deployment) [![Release](https://github.com/saidsef/ml-classifier/actions/workflows/release.yml/badge.svg)](#deployment)
Classify news articles into different categories using Machine Learning. The dataset consists of 6000 documents and 47 categories.
My goal is to show you how to create a predictive model(s) that will classification labels for news articles.
## Objective
- To classify news articles
- Learn the basics of natural language processing
- Build models using sklearn and choose the best one
- Use sklearn's make_pipeline class
- Learn how to turn it into a service
- Learn how to make it composable and portable
- ...
- Profit?## Prerequisite
- Python >= v3.11
- Jupyter Notebook
- Some knowledge of Machine Learning## Python Libs
- NumPy
- Pandas
- SciPy
- Matplotlib
- Jupyter
- Scikit-learn (the library that we will use later in this post when creating the classifier model(s))## We Will
- Apply some preprocessing steps to prepare the data.
- We will perform a descriptive analysis of the data to better understand the main characteristics that they have
- We will continue by practicing how to train different machine learning models using scikit-learn. It is one of the most popular python libraries for machine learning
- We will also use a subset of the dataset for training purposes
- We will iterate and evaluate the learned models by using unseen data. Later, we will compare them until we find a good models that meets our expectations, and use a `VotingClassifier` *soft* voting for unfitted estimators.
- Once we have chosen the candidate model(s), we will use it to perform predictions and to create a simple web application that consumes this predictive model## Getting started with the machine learning tutorial
See [Jupyter Notebook](https://machinelearningmastery.com/start-here/)
## Deployment
As a container:
```shell
docker run -d -p 7070:7070 docker.io/saidsef/ml-classifier:latest
```As a Python application:
```shell
pip3 install -r requirements.txt
PORT=7070 classifier-ml.py
```## JSON Format
Payload format should be [JSON format](test/test.json)
```json
{ "body": "text-goes-here" }
```## The Request
The quest must be `POST` and `JSON` format:
```shell
curl -XPOST http://localhost:7070/api/v1/news -H 'Content-Type: application/json' -d @test/test.json
```Response will be `json` format:
```json
{
"score": 1,
"category": "Opinion"
}
```## Kubernetes
```shell
kubectl apply -k ./deployment
```