https://github.com/rajspeaks/machine-learning-approach-to-bengali-corpus-tokenization-stemming-pos-tagging-using-bnltk
Machine Learning approach to Bengali Corpus POS Tagging using BNLTK. This is an experimenting project under the mentorship of Prof. Sandipan Ganguly, HIT-K.
https://github.com/rajspeaks/machine-learning-approach-to-bengali-corpus-tokenization-stemming-pos-tagging-using-bnltk
bengali bengali-dataset bengali-language-processing bengali-natural-language-processing bengali-nlp english machine-learning natural-language-processing natural-language-understanding nlp nlp-library nlp-machine-learning postagger postagging rajdeep-das rajspeaks stemmer stemming tokenizer-parser
Last synced: 23 days ago
JSON representation
Machine Learning approach to Bengali Corpus POS Tagging using BNLTK. This is an experimenting project under the mentorship of Prof. Sandipan Ganguly, HIT-K.
- Host: GitHub
- URL: https://github.com/rajspeaks/machine-learning-approach-to-bengali-corpus-tokenization-stemming-pos-tagging-using-bnltk
- Owner: Rajspeaks
- License: gpl-3.0
- Created: 2022-03-25T11:25:20.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-05-02T06:28:13.000Z (almost 3 years ago)
- Last Synced: 2025-02-09T22:46:52.762Z (3 months ago)
- Topics: bengali, bengali-dataset, bengali-language-processing, bengali-natural-language-processing, bengali-nlp, english, machine-learning, natural-language-processing, natural-language-understanding, nlp, nlp-library, nlp-machine-learning, postagger, postagging, rajdeep-das, rajspeaks, stemmer, stemming, tokenizer-parser
- Language: Jupyter Notebook
- Homepage:
- Size: 78.1 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Machine Learning approach to Bengali Corpus Tokenization | Stemming | POS Tagging using BNLTK
BNLTK Means Bengali Natural Language Toolkit developed by [Asraf Patoary](https://github.com/ashwoolford). By using BNLTK, we can tokenize, stemming, tagging parts of speeches categories on Bengali Words.
## Installation:
```
pip install bnltk
```## Methodology
- First we have installed BNLTK.
- Imported Tokenizers from bnltk & tokenized a Bengali Sentence by splitting into individual words. Then applied the same on a larger Bengali Corpus to tokenize Bengali words.
- Imported BanglaStemmer() from bnltk to apply stemming on Bengali Words. Repeated 2 times the same on different words.
- Downloaded the Datafile from bnltk before moving for further execution.
- Imported PosTagger from bnltk & applied on a Bengali small sentence & tagged each Bengali words into different Parts of Speech categories. Repeated the same 2 times more on larger Bengali Corpora.## Tools & Library requirements:
- Google Colab/Jupyter-Notebook
- BNLTK Library### Reference:
1. https://ashwoolford.github.io/bnltk-documentation/
2. https://github.com/ashwoolford/bnltk### Mentor:
Prof. Sandipan Ganguly
### Developer:
Rajdeep Das