https://github.com/ritvik19/news-classifier
https://github.com/ritvik19/news-classifier
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/ritvik19/news-classifier
- Owner: Ritvik19
- Created: 2019-04-05T17:22:50.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2020-10-20T03:38:22.000Z (over 5 years ago)
- Last Synced: 2025-01-23T06:14:50.430Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 79.8 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: NewsScraping.py
Awesome Lists containing this project
README
# News-Classifier
Tag News Articles with Categories to which they might belong using Machine Learning
___
### Data Preprocessing
* URLs are removed from the text
* Text is lowercased
* Contractions are expanded
* Punctuations are removed from the text
* Digits are removed
* Extra white spaces are removed from the text
* Stop words are removed
* TFIDF Vectors are created (1grams)
### Approach
Models Trained on manually scraped data from
* Inshorts
* ANI
* India TV
* Janta Ka Reporter
* OpIndia
* PostCard News
* Swarajya
* TFIPost
* TheWeek
* TheWire
and a dataset available on [Kaggle](https://www.kaggle.com/rmisra/news-category-dataset)
as a multi label problem using TFIDF vectors as features
This approach tags news articles as the following 9 tags:
National, Sports, World, Politics, Technology, Entertainment, Business, Lifestyle, Hatke