https://github.com/gokulnpc/news-category-classifier-nlp
A Bag of N-Grams is a text representation method used in natural language processing (NLP). It involves breaking down text into a collection (bag) of contiguous sequences of 'N' items (grams). Each 'gram' can be a word or a character, depending on the specific application.
https://github.com/gokulnpc/news-category-classifier-nlp
Last synced: about 2 months ago
JSON representation
A Bag of N-Grams is a text representation method used in natural language processing (NLP). It involves breaking down text into a collection (bag) of contiguous sequences of 'N' items (grams). Each 'gram' can be a word or a character, depending on the specific application.
- Host: GitHub
- URL: https://github.com/gokulnpc/news-category-classifier-nlp
- Owner: gokulnpc
- Created: 2024-06-08T07:22:19.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-06-08T07:24:55.000Z (12 months ago)
- Last Synced: 2025-02-02T03:44:50.980Z (4 months ago)
- Language: Jupyter Notebook
- Size: 945 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# New Category Classifier using Bag of N grams
A Bag of N-Grams is a text representation method used in natural language processing (NLP). It involves breaking down text into a collection (bag) of contiguous sequences of 'N' items (grams). Each 'gram' can be a word or a character, depending on the specific application.
For example:
- Unigrams (1-grams): "Natural Language Processing" → ["Natural", "Language", "Processing"]
- Bigrams (2-grams): "Natural Language Processing" → ["Natural Language", "Language Processing"]
- Trigrams (3-grams): "Natural Language Processing" → ["Natural Language Processing"]This method captures contextual information and word order, which helps in tasks like text classification, sentiment analysis, and more. The "bag" aspect indicates that the sequence of n-grams is treated as a collection where the order does not matter, focusing instead on the frequency and presence of n-grams.
## Result: Confusion Matrix
