{"id":20359424,"url":"https://github.com/saidsef/ml-classifier","last_synced_at":"2025-04-12T03:32:38.562Z","repository":{"id":38823831,"uuid":"139046747","full_name":"saidsef/ml-classifier","owner":"saidsef","description":"Classify news articles into different categories using Machine Learning","archived":false,"fork":false,"pushed_at":"2024-12-02T19:00:10.000Z","size":228722,"stargazers_count":10,"open_issues_count":1,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-12-06T21:54:09.835Z","etag":null,"topics":["classification","classify-news-articles","data-visualization","hacktoberfest","machine-learning","machine-learning-algorithms","openfaas-function","serverless-functions","sklearn-classify"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saidsef.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"custom":["https://revolut.me/saidsef"]}},"created_at":"2018-06-28T17:02:35.000Z","updated_at":"2024-11-29T13:52:05.000Z","dependencies_parsed_at":"2023-01-31T06:46:11.577Z","dependency_job_id":"8de0a808-891c-48a0-852f-86b5b7188dbb","html_url":"https://github.com/saidsef/ml-classifier","commit_stats":null,"previous_names":[],"tags_count":30,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saidsef%2Fml-classifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saidsef%2Fml-classifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saidsef%2Fml-classifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saidsef%2Fml-classifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saidsef","download_url":"https://codeload.github.com/saidsef/ml-classifier/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248512760,"owners_count":21116674,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","classify-news-articles","data-visualization","hacktoberfest","machine-learning","machine-learning-algorithms","openfaas-function","serverless-functions","sklearn-classify"],"created_at":"2024-11-14T23:33:50.937Z","updated_at":"2025-04-12T03:32:38.557Z","avatar_url":"https://github.com/saidsef.png","language":"Jupyter Notebook","readme":"# Machine Learning - News Articles classification with sklearn [![CI](https://github.com/saidsef/ml-classifier/actions/workflows/ci.yml/badge.svg)](#deployment) [![Tagging](https://github.com/saidsef/ml-classifier/actions/workflows/tagging.yml/badge.svg)](#deployment) [![Release](https://github.com/saidsef/ml-classifier/actions/workflows/release.yml/badge.svg)](#deployment)\n\nClassify news articles into different categories using Machine Learning.  The dataset consists of 6000 documents and 47 categories.\n\nMy goal is to show you how to create a predictive model(s) that will classification labels for news articles.\n\n## Objective\n\n- To classify news articles\n- Learn the basics of natural language processing\n- Build models using sklearn and choose the best one\n- Use sklearn's make_pipeline class\n- Learn how to turn it into a service\n- Learn how to make it composable and portable\n- ...\n- Profit?\n\n## Prerequisite\n\n- Python \u003e= v3.11\n- Jupyter Notebook\n- Some knowledge of Machine Learning\n\n## Python Libs\n\n- NumPy\n- Pandas\n- SciPy\n- Matplotlib\n- Jupyter\n- Scikit-learn (the library that we will use later in this post when creating the classifier model(s))\n\n## We Will\n\n- Apply some preprocessing steps to prepare the data.\n- We will perform a descriptive analysis of the data to better understand the main characteristics that they have\n- We will continue by practicing how to train different machine learning models using scikit-learn. It is one of the most popular python libraries for machine learning\n- We will also use a subset of the dataset for training purposes\n- We will iterate and evaluate the learned models by using unseen data. Later, we will compare them until we find a good models that meets our expectations, and use a `VotingClassifier` *soft* voting for unfitted estimators.\n- Once we have chosen the candidate model(s), we will use it to perform predictions and to create a simple web application that consumes this predictive model\n\n## Getting started with the machine learning tutorial\n\nSee [Jupyter Notebook](https://machinelearningmastery.com/start-here/)\n\n## Deployment\n\nAs a container:\n\n```shell\ndocker run -d -p 7070:7070 docker.io/saidsef/ml-classifier:latest\n```\n\nAs a Python application:\n\n```shell\n\npip3 install -r requirements.txt\n\nPORT=7070 classifier-ml.py\n```\n\n## JSON Format\n\nPayload format should be [JSON format](test/test.json)\n\n```json\n{ \"body\": \"text-goes-here\" }\n```\n\n## The Request\n\nThe quest must be `POST` and `JSON` format:\n\n```shell\ncurl -XPOST http://localhost:7070/api/v1/news -H 'Content-Type: application/json' -d @test/test.json\n```\n\nResponse will be `json` format:\n\n```json\n{\n  \"score\": 1,\n  \"category\": \"Opinion\"\n}\n```\n\n## Kubernetes\n\n```shell\nkubectl apply -k ./deployment\n```\n\n\n","funding_links":["https://revolut.me/saidsef"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaidsef%2Fml-classifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaidsef%2Fml-classifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaidsef%2Fml-classifier/lists"}